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NOVEL BACTERIAL GENES AND PROTEINS THAT ARE ESSENTIAL FOR 
CELL VIABILITY AND THEIR USES 



Throughout this application various publications are referenced. The disclosures of these 
publications in their entireties are hereby incorporated by reference into this application in 
order to more fully describe the state of the art to which this mvention pertains, 

FIELD OF THE INVENTION 

The present invention relates generally to nucleotide sequences, and polypeptides 
encoded by the sequences, that are essential for bacterial viability, and to methods of 
using the nucleotide and polypeptide sequences. 

BACKGROUND OF THE INVENTION 

Bacterial genera, such as .Streptococcus, Staphylococcias, Pseudomonas, Yersinia, 
Salmonella, and Enterobacter, are the cause of numerous afflictions in humans and 
animals. Bacterial infection can lead to serious health conditions, including pneumonia, 
osteomyelitis, meningitis, sinusitis, otitis, cystitis, and even food poisoning. Typically, 
these infections can be treated with standard antimicrobial agents such as antibiotics. 
However, the emergence of pathogenic bacterial strains that are resistant to antibiotics 
has risen alarmingly in the past two decades. This situation has created an urgent need 
for the development of new antimicrobial agents. 

One strategy for developing new antimicrobial agents is to identify bacterial gene 
sequences that encode gene products that are essential for bacterial cell viability and 

1 
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develop and/or identify agents which inhibit the function of the gene product. DNA 
sequencing technology has advanced j&om sequencing one gene at a time to sequencing 
entire genomes, the sum of all genes in an organism. With the. recent arrival of bacterial 
genomic information, it is now possible to compare multiple bacterial genomes in an 
5 attempt to identify genes that encode conserved gene products. In this manner, one 
skilled in the art may identify a set of conserved bacterial genes, including a subset of 
genes that are essential for bacterial cell viability. The essential gene is then used as a 
starting point to develop therapeutic agents that inhibit or inactivate the product of the 
essential gene. 

• • • 

The availability of DNA sequence information for multiple microbial genomes is a recent 
development. The public release of the first complete genome, Haemophilus influenzae 
(Fleischmann, R.D., et al. 1995 Science 269:496-512 ), was followed m rapid succession 
by a number of public and private genome sequencing programs. Presently, some 20 
15 completely sequenced bacterial genomes have been published, and over 100 other 
sequencing projects are underway (Blattner, F.R., et al., 1997 Science 277:1453-74; 
Ferretti, J.J., et al., 1997 Adv Exp Med Biol 418:961-963; Koonin, E.V., et al, 1996 . 
Methods Enzymol 266:295-322). Analyses of these data indicate that approximately 
46% of putative bacterial genes are of unknown function having no attributable function. 

20 

Others have- pursued various strategies to identify bacterial genes that are essential for 
viability. These strategies include: identifying genes that are expressed by the bacteria 
when present in the infected host (Hensel," M., et al., 1995 Science 269:400-3), . 
identifying essential genes by isolating temperature sensitive mutants (Schmid, M.B., et 
25 al,, 1998 Curr Opin Chem Biol 2:529-34), and identifying genes in pathways known 
iBrom prior physiological studies to be essential (Skarzynski, T. et al., 1996 Structure 
1996 4:1465-74) 

There continues to be a need to identify bacterial genes that encode gene products that are 
30 essential for cell viability, such as cell replication, growth, and survival. These genes and 
their encoded gene products can be used as a starting point towards identifying agents 
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that inhibit functions essential for cell viability, thereby causing bacterial cell stasis or 
death (e.g., antibacterial agents). 

The present invention provides experimental identification of novel, conserved essential 
S genes (ceg) &om bacteria and their encoded protein products. The ceg genes are 
considered essential to cell viability because disruption of an endogenous ceg gene results 
in lethality of a bacterial cell (e.g., as determined by failure to recover viable 
chloramphenicol-resistant colonies, as described herein). . Thus, the gene products 
encoded by these genes are potentially valuable targets for chemotherapeutic intervention 
10 of bacterial infections. 

The ceg nucleotide sequences of the invention were obtained by large-scale 
computational comparisons of multiple genome sequences to identify conserved protein 
coding regions, followed by gene disruption to identify cegs. The coriservation of protein 
15 sequences in many cases is believed to reflect the higher level conservation of comnion 
biochemical pathways essential for bacterial function and viability. 

SUMMARY OF THE INVENTION 

9 

20 The acronytos "CEG" and "ceg" stand for Conserved Essential Gene. For convenience, 
the italicized term ceg refers herein to ceg nucleotide sequences. The capitalized term CEG 
refers herein to CEG polypeptide sequences. 

Embodiments of the ceg nucleotide sequences and the CEG polypeptide sequences are 
. 25 designated CFEs which stands for CEG For Expression, The CFEs are polypeptides 
resulting from expression of the ceg nucleotide sequence. 

The .present invention provides isolated nucleotide sequences of conserved essential 
genes from bacteria, designated ceg. The invention also provides recombinant nucleic 
30 acid molecules including the ceg sequences of the invention, and methods of uses thereof. 
Examples of nucleic acid molecules having ceg sequences are described in SEQ ID 

•3' 
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NOS.: 1-113. The invention further provides isolated polypeptides and recombinant 
polypeptides having the CEG sequences of the invention, and methods of uses thereof 
Examples of polypeptides having CEG sequences are described in SEQ ID N0S.:114- 
226. 

5 

The ceg sequences of the present invention are DNA or RNA. Further, the invention 
includes nucleic acid molecules that are identical or nearly identical (e.g., similar) with 
the ceg sequences of the invention. The invention additionally provides polynucleotide 
sequences that hybridize under strmgent conditions to the ceg sequences of the invention. 

10 A further embodiment provides polynucleotide sequences which are complementary to 
the ceg sequences of the invention. Yet another embodiment provides ceg nucleic acid 
molecules that are labeled with a detectable marker. Another embodiment provides 
recombinant nucleic acid molecules, such as a vector or a fusion molecule, including the 
ceg sequences of the invention. 

15 . ' 

The present invention provides various ceg sequences, fragments thereof having essential 
gene activity, and related molecules such as antisense molecules, oligonucleotides, 
peptide nucleic acids (PNA), fragments, and portions thereof 

• ■ 

20 The present invention relates to the inclusion of the polynucleotides encoding CEG gene 
products, such as CEG polypeptides, in an expression vector which can be used to 
transform host cells or organisms. Such transgenic hosts are useful for the production of • 
CEG gene products for the development of antibacterial agents such as antibiotics. 

25 The invention further provides substantially purified CEG gene products, and uses 
thereof. 

The invention also relates to pharmaceutical compositions comprising antisense 

* 

molecules capable of disrupting expression of ceg sequences, agonists, antagonists or 
30 inhibitors of CEG gene products, and antibodies reactive against the CEG polypeptides. 
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These compositions are useful for preventing tiie growth or survival of bacteria, for 
example, in the treatment of conditions associated with bacterial infections. 

BRIEF DESCRIPTION OF THE FIGURES 

i 

5 

Figure 1 : A schematic representation of the gene disruption assay, as described in Example 
3, infra. A) A recombinant vector undergomg homologous recombination with the host 
genome. B) The result of homologous recombination. 

A 

• ■ 

10 Figure 2: A schematic representation of the polarity test for operons, as described in 

• « 

Examples 2 and 3, infra. A) The recombinant vector xmdergoing homologous 
recombination with the host genome. B) Case 1: one possible result of homologous 
recombination; the downstream Gene B has an independent promoter. C) Case 2; another 
possible result of homologous recombination; the downstream Gene B does not have an 
15 independent promoter. 

Figure 3: Purification of 2CFE 75, as described in Example 6, infra. A) Fractionation 
profile of 2CFE 75 eluted from a Ni-NTA column. B) Gel electrophoresis of pooled 
fractions of CFE 75. C) Non-denaturing gel electrophoresis to determine oligo form of 
20 2CFE 75. 

• * ♦ 

Figure 4: Fractionation profile of 2CFE 3 eluted from a hydroxyapatite colunan, as described 

in Example 7, infra. 

25 Figure 5: The biosynthesis pathway of Coenzyme A which starts with phosphorylation of 
• pantothenate. 

Figure 6: Circular dichroism spectra of 2CFE 101 and 103, as described in Example 10, 
infra. A) Circular dichroism spectra of 2CFE 101 and 103 at 25 degrees C. B) Circular 
30 dichroism thermal melt spectra of 2CFE 10 1 and 103 at a range of zero to 100 degrees C. 

5 
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Figure 7: Circular dichroism spectra of aggregate and monomer pools of 2CFE 101 and 103, 
as described in Example 1 0, infra, .A) Circular dichroism spectra of aggregate and monomer 
pools of 2CFE 101 and 103 at 25 degrees C. B) Circular dichroism thermal melt spectra of 
aggregate and monomer pools of 2CFE 101 and 103 at a range of zero to 100 degrees C. 

5 

Figure 8: Absorbance spectra of.pantothenate-dependent production of ADP, as described in 
Example 10, infra, - 

Figure 9: The results of size exclusion chromatography and gel electrophoresis showing the 
10 oligomeric forms of 2CFE 21 and 39, as described in Example 11, infra. Lanes 1-6 contain 
.2CFE 2 1 , lane 7 is a molecular weight marker, lanes 8- 1 0 contain 2CFE 39. 

Figure 10: Gel electrophoresis of a helicase reaction using 2CFE 21 and 39 and radiolabeled 
synthetic HoUiday Junction template, as described in Example 11, infra. Lane 1 contains 

15 the synthetic Holliday Junction template; lane 2 <x)ntains the synthetic duplex; lane 3 
contains a single-stranded template; lane 4 contains the helicase reaction using 2CFE 39; 
lane 5 contains the helicase reaction using 2CFE 21; lanes 6-8 contain the helicase reaction 
using 2CFE 39 and 21 at varying concentrations (e.g., 1, 2, and 3 jiM each); and lane 9 
contains the helicase reaction using 2 pM each 2CFE 39 and 21 in the presence of ethidium 

20 bromide. 

Figure 1 1 : A graph depicting the results of the helicase reaction \^ch were monitored by , 
measuring the unquenching of the Holliday Junction templates with time, as described in 
Example U, infra. 
25 • 

Figure 12: Capillary electrophoresis results of 2CFE 8 with and without ssDNA, as 
described in Example 12, infra. A) Electropherogram of 2CFE 8 alone: B) 
Electropherogram of 2CFE 8 in the presence of a 32-nucIeotide single-stranded oUgomer, 

.30 Figure 13: Gel mobility shift assay of 2CFE 8, and 2CFE 8 in the presence of a single- 
straiided 32-mer, as described in Example 12, infra. . A) An ethidium bromide-stained, 

* 
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native, polyacrylamide gel containing 2CFE 8, and 2CFE 8 in the presence of a 32-mer B) 
The same native, polyacrylamide gel stained with Coomassie. 

Figure 14: The N-acetyl glucosamine pathway putatively mediated .by 2CFE 3 and 2CFE 
5 86, as described in Example 13, infra. 

Figure 15: Capillary electrophoresis results of 2CFE 3 with and without putative substrates, 
as described in Example 13, infra.. A) Electropherogram of 2CFE 3 with .and without 
glucosamme-1 -phosphate. B) Electropherogram of 2CFE 3 with and without D-glucose-1- 
10 phosphate. C) Electropherogram of 2CFE 3 alone, 2CFE 3 and glucose- 1-phosphate, and 
2CFE 3 and glucose-6-phosphate. D) Electropherograrii of 2CFE 3 alone or in the presence 
of glucosamine-l -phosphate, glucosamine-6-phosphate, D-glucose, D(+) galactose, and a- 
D-glucose-l-phosphate. 

m 

• ■ « 

« 

15 Figure 16: Capillary electrophoresis results of FITC-derivitized 2CFE 3 polypeptide with 
and without D-glucosamine-6-phosphate (substrate) to produce the product D-glucosamine- 
1 -phosphate, using laser-induced fluorescence, as described in Example 13, infra. 
Electropherogram of D-glucosamine-6-phosphate (putative substrate), 2CFE 3 reacted with 
D-glucosamine-6-phosphatB, and the product glucosamine-l-phhosphate. 

20 

« 

Figure 17: Gel electrophoresis of 2CFE 86 eluted from an Ni-NTA column, as described in 
Example 13, infra. 

Figure 18: HPLC analysis of a coupled reaction including 2CFE 3, 2CFE 86, and D- 
25 glucosamine-6-phosphate to produce the product, UDP-N-acetylgliicosamine-l -phosphate 
(UDPAG), as described in Example 13, infra. 

Figure 1 9: A fatty acid biosynthesis pathway. 
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Figure 20: Size exclusion chromatography to determine the molecular weight and 
oligomeric form of 2CFE 34, as described in Example 14, infra.. Selected eluted samples 
were sized by gel electrophoresis. 

Figure 21: Gel electrophoresis of 2CFE 41 eluted from a Ni-NTA column, as described in 
Example 15, infra. 

Figure 22: Capillary electrophoresis results of 2CFE 40, 41, and 46, as described in Example 
15, infra. 

Figure 23: Depicts a schematic diagram of a ligand which binds 2CFE 34. The ligand is 2- 
phenyl-N-(3 corboxyMhydroxyphenyl) azabicyclo [4.3.0] iiona-2, 8-diene. 

Figure 24: Depicts a schematic diagram of a ligand which binds 2CFE 43. The ligand is N- 
(3, 5-dinitrob*enzyl)-7-trifIuoromethyl benza diaza fiiranolactone. 

Figure 25: Depicts a schematic diagram of a ligand which binds 2CFE 43. The ligand is 2- 
amino (N-para-methylphenyl sulfonamide)-3-phenylpropianic acid. 

Figure 26: A nucleic acid sequence of 2CFE1 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 27: A nucleic acid sequence of 2CFE2 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

* 

Figure 28: A nucleic acid sequence of 2CFE3 dq)Osited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 29: A nucleic acid sequence of 2CFE4. deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000.- 
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Figure 30: A nucleic acid sequence of 2CFE5 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

5 Figure 31: A nucleic acid sequence of 2CFE6 deposited with the American Type Culture 
Collection as ATCC designation ' on December 20, 2000. 

Figure 32: A nucleic acid sequence of 2CFE7 deposited with the American Type Culture 
Collection as ATCC designation . on December 20, 2000. 

10 

Figure 33: A nucleic acid sequence of 2CFE8 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 34: A nucleic acid sequence of 2CFE9 deposited with the American Type Culture 
1 5 Collection as ATCC designation on December 20, 2000. 

a 

Figure 35: A nucleic acid sequence of 2CFE10 deposited with die American Type Culture 
Collection as ATCC designation on December 20, 2000. 

20 Figure 36: A nucleic acid sequence of 2CFE1 1 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 37: A nucleic acid sequence of 2CFE12 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000, 

25 

Figure 38: A nucleic acid sequence of 2CFE13 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 39: A nucleic acid sequence of 2CFE14 deposited with the American Type Culture 
30 Collection as ATCC designation on December 20, 2000. 

■ 
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Figure 40: A nucleic acid sequence of 2CFE1 5 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

■ 

Figure 41 ; A nucleic acid sequence of 2CFE16 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 42: A nucleic acid sequence of 2CFE17 deposited with the American Type Culture 

* 

Collection as ATCC designation on December 20, 2000. 

Figure 43: A nucleic acid sequence of 2CFE19 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

i 

Figure 44: A nucleic acid sequence of 2CFE21 deposited with the American Type Culture 
Collection as ATCC designation • on December 20, 2000. ' 

Figure 45: A nucleic acid sequence of 2CFE24 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 46: A nucleic acid sequence of 2CFE25 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 47: A nucleic acid sequence of 2CFE26 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 48: A nucleic acid sequence of 2CFE27 deposited with the American Type Culture 
Collection as ATCC designation ' on December 20, 2000. 

Figure 49: A nucleic acid sequence of 2CFE28 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 
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Figure 50: A nucleic acid sequence of 2CFE29 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 51: A nucleic acid sequence of 2CFE30 deposited with the American Type Culture 
5 Collection as ATCC designation • ' ' on December 20, 2000. 

Figure 52: A nucleic acid sequence of 2CFE3 1 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

10 Figure 53: A nucleic acid sequence of 2CFE32 deposited with the American Type Culture 
Collectioh as ATCC designation on December 20, 2000.- 

Figure 54: A nucleic acid sequence of 2CFE33 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

15 

Figure 55: A nucleic acid sequence of 2CFE34 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 56: A nucleic acid sequence of 2CFE35 deposited with the American Type Culture 
20 Collection as ATCC designation on December 20, 2000. 

. Figure 57: A nucleic acid sequence of 2CFE36 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

25 Figure 58: A nucleic acid sequence of 2CFE37 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

■ 

Figure 59: A nucleic acid sequence of 2CFE38 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

30 
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Figure 60: A nucleic acid sequence of 2CFE39 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 61: A nucleic acid sequence of 2CFE40 deposited with the American Type Culture 
5 Collection as ATCC designation on December 20, 2000. 

Figure 62: A nucleic acid sequence of 2CFE41 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

■ 10 Figure 63: A nucleic acid sequence of 2CFE42 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 64: A nucleic acid sequence of 2CFE43 deposited with the American Type Culture 
Collection as ATCC designation _^ on December 20, 2000: 

15 

Figure 65: A nucleic acid sequence of 2CFE44 deposited with the .American Type Culture" 
Collection as ATCC designation on December 20, 2000. • 

Figure 66: A nucleic acid sequence of 2CFE45 deposited with the American Type Culture 
20 Collection as ATCC designation on December 20, 2000. 

Figure 67: A nucleic acid sequence of 2CFE46 deposited with the American Type Culture 
Collection as ATCC designation \ on Dec^ber 20, 2000. 

25 Figure 68: A nucleic acid sequence of 2CFE47 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 69: A nucleic acid sequence of 2CFE48 deposited with the American Type Culture 
Collection as ATCC designation ^ on December 20, 2000, 

30 
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Figure 70; A nucleic acid sequence of 2CFE49 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 71: A nucleic acid sequence of 2CFE50 deposited with the American Type Culture 
5 Collection as ATCC designation on December 20, 2000. 

Figure 72: A nucleic acid sequence of 2CFE51 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

10 Figure 73: A nucleic acid sequence of 2CFE52 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000, 

Figure 74: A nucleic acid sequence of 2CFE53 deposited with the American Type .Culture 
Collection as ATCC designation on December 20, 2000, 

15 

Figure 75: A nucleic acid sequence of 2CFE54 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 76: A nucleic acid sequence of 2CFE55 deposited with the American Type Culture 
20 Collection as ATCC designation ._ on December 20, 2000. 

Figure 77: A nucleic acid sequence of 2CFE56 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

25 Figure 78: A nucleic acid sequence of 2CFE57 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 79: A nucleic acid sequence of 2CFE58 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

30 
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Figure 80: A nucleic acid sequence of 2CFE59 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 8 1 : A nucleic acid sequence of 2CFE60 deposited with the American Type Culture 
5 Collection as ATCC designation on December 20, 2000. 

Figure 82: A nucleic acid sequence of 2CFE61 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

10 Figure 83: A nucleic acid sequence of 2CFE62 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 84: A nucleic acid sequence of 2CFE64 deposited with the American Type Culture 

m 

Collection as ATCC designation on December 20, 2000. 

15 ' 

Figure 85: A nucleic acid sequence of 2CFE65 deposited with the American Type Culture 
Collection as ATCC designation • on December 20,.2000. 

Figure 86: A nucleic acid sequence of 2CFE66 deposited with the American Type Culture 
20 Collection as ATCC designation ' on December 20, 2000. 

Figure 87: A nucleic acid sequence of 2CFE67 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

■ « 

25 Figure 88: A nucleic acid sequence of 2CFE68 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 89: A nucleic acid sequence of 2CFE69 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

30 
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Figure 90: A nucleic acid sequence of 2CFE70 deposited yrfth the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

* 

Figure 91 : A nucleic acid sequence of 2CFE71 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 92: A nucleic acid sequence of 2CFE72 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 93: A nucleic acid sequence of 2CjFE75 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 94: A nucleic acid sequence of 2CFE76 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 95. A nucleic acid sequence of 2CFE78 deposited with the American Type Culture 
Collection as ATCC designation on D.ecember 20, 2000. 

Figure 96: A nucleic acid sequence of 2CFE79 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 97: A nucleic acid sequence of 2CFE80 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 98: A nucleic acid sequence of 2CFE81 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

■ 

Figure 99: A nucleic acid sequence of 2CFE82 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 
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Figure 100: A nucleic acid sequence of 2CFE83 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 101 : A nucleic acid sequence of 2CFE84 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 102: A nucleic acid sequence of 2CFE85 deposited with the American Type Culture 
Collection as ATCC designation on December 20 2000. 

Figure 103: A nucleic acid sequence of 2CFE86 deposited with the American Type Culture 
Collection as ATCC designation . on December 20, 2000. 

m 

Figure 104: A nucleic acid sequence of 2CFE87 deposited with the American Type Culture 
Collection as ATCC designation^ on December 20, 2000. 

Figure 105: A nucleic acid sequence of 2CFE88 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000, 

Figure 106: A nucleic acid sequence of 2CFE89 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 107: A nucleic acid sequence of 2CFE90 deposited with tiie American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 108: A nucleic acid sequence of 2CFE91 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 109: A nucleic acid sequence of 2CFE92 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 
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Figure 1 10: A nucleic acid sequence of 2CFE94 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 1 1 1 : A nucleic acid sequence of 2CFE95 deposited with the American Type Culture 
5 Collection as ATCC designation on December 20, 2000. 

Figure 1 1 2: A nucleic acid sequence of 2CFE96 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

1 0 Figure 1 1 3 : A nucleic acid sequence of 2CFE97 deposited with the American Type Culture 
Collection as ATCC designation on December 20, 2000. 

Figure 1 14: A nucleic acid sequence of 2CFE99 deposited with the American Type Culture 

Collection as ATCC designation . on December 20, 2000. 

15 • • 

. Figure 115: A nucleic acid sequence of 2CFE101 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

Figure 116: A nucleic acid sequence of 2CFE102 deposited with the American Type 
20 Culture Collection as ATCC designation on December 20, 2000. 

Figure 117: A nucleic acid sequence of 2CFE103 deposited with the American Type 
Culture Collection as ATCC designation . on December 20, 2000. 

25 Figure 118: A nucleic acid sequence of 2CFE104 deposited with the American Type 
Culture. Collection as ATCC designation^ on December 20, 2000, 

Figure 119: A nucleic acid sequence of 2CFE105 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 
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Figure 120: A nucleic acid sequence of 2CFE106 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

Figure 121: A nucleic acid sequence of 2CFE107 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

Figure 122: A nucleic acid sequence of 2CFE108 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

■ f 

Figure 123: A nucleic acid sequence of 2CFE109 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

Figure 124: A nucleic acid sequence of 2CFE111 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

Figure 125: A nucleic acid sequence of 2CFE112 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

Figure 126: A nucleic acid sequence of 2CFE113 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

Figure 127: A nucleic acid sequence of 2CFE114 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

Figure 128: A nucleic acid sequence of 2CFE115 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

Figure 129: A nucleic acid sequence of 2CFE116 deposited widi the American Type 
Culture Collection as ATCC designation ; on December 20, 2000. 
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Figure 130: A nucleic acid sequence of 2CFE117 deposited with the American Type 
Culture Collection as ATCC designation on December 20, 2000. 

* 

■ ■ 

Figure 131: Schematic structures of alkyloids which are ligands, for example, of 2CFE42. 

DETAILED DESCRIPTION OF THE INVENTION 

Definitions 

All scientific and technical terms used in this application have meanings commonly used in 
the art unless otherwise specified. As used in this application; the following words or 
phrases have the meanings specified. 

As used herein, a ceg nucleic acid molecule is said to be "isolated" when the nucleic acid 
molecule is substantially separated fi-om contaminant nucleic acid molecules tibiat encode 
polypeptides other than CEGs. Additionally, isolated nucleic acid molecule refers to any 
RNA or DNA sequence obtained fi"om a natural source, or constructed by recombinant 
methods, or synthesized. A skilled artisan can readily employ nucleic acid isolation 
procedures to obtain an isolated nucleic acid molecule having ceg sequences. 

* 

The term ''ceg'' includes all isolated forms of ceg nucleotide and CEG amino acid sequences 
disclosed herein. The ceg sequences encode gene products that have essential biological 
functions in bacterial cells, such as, for example, nucleotide biosynthesis, amino acid 
biosynthesis, DNA replication, RNA transcription, protein translation, DNA 
recombination, DNA repair, biosynthesis of cofactors (e.g.. Coenzyme A), biosynthesis 
of prosthetic groups, cellular processes (e.g., chaperones, cell division, and polypeptide 
secretion), energy metabolism (e.g., pentose phosphate pathway, glycolysis, 
gluconeogenesis), fatty acid biosynthesis, cell wall biosynthesis, and/or biosynthesis of 
purines, pyrimidines, nucleosides, and nucleotides. Accordingly, the gene products of the 
ceg nucleotide sequences are required for viability of bacterial cells. The term "ce^' also 
includes variants having nucleotide sequence similarity to the disclosed ceg sequences. 
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* 

including sequences isolated from various bacterial genera and species, allelic variants, 
mutant variants, and ceg variants that encode conservative and non-conservative amino acid 
substitutions. The present invention also provides for all ceg sequences generated by 
recombinant DNA technology, including complementary sequences, ceg sequences that 
5 hybridize to the sequences of the invention at high stringency hybridization conditions, 
fusion genes comprising a ceg sequence, and codon usage variants. 

The term "essential genes" refers to a nucleotide sequence that encodes a gene product 
having a function which is required for cell viability. The term "essential protein" refers 
10 to a polypeptide that is encoded by an essential gene and has a function that is required 
for cell viability. Accordingly, a mutation that disrupts the function of the essential gene 
or essential proteins results in a loss of viability of cells harboring the mutation. 

"Non-essential genes" or "non-essential proteins" refer to genomic information of the 
15 protein(s) or RNAs encoded therefrom which, when disrupted by a mutation, do not 
result in a loss of viability of cells harboring said mutation under defined laboratory 
conditions. 

As used herein, a nucleotide sequence is said to be "identical" to another reference 
20 sequence when both nucleotide sequences are exactly alike. 

As used herein, a nucleotide sequence is said to be "similar" to another reference 
sequence when a comparison of the two sequences shows that they have a low level of 
sequence differences. For example, two sequences are considered to be similar to each 
25 other when the percentage of nucleotides that are shared between the two sequences is 
between about 70 % to 99.99% over the entire length of the two sequences. 

As used herein an amino acid sequence is said to be "similar" to another reference 
sequence wheii a comparison of the two sequences shows that they have a loW level of 
30 sequence differences. For example, two sequences are considered to be similar to each 
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m 

Other when the percentage of amino acids that are shared between the two sequences may 
be between about 30% to 100% identity over the entire length ofthe two sequences. 

As used herein, an "allele" or "allelic sequence" is an alternative form of the naturally- 
occurring ceg sequence. Alleles result from a mutation, that changes the nucleotide 
sequence, and generally produce altered mRNAs or polypeptides whose structure or 
function may or may not be altered, 

"Substantially purified" as used herein means a specific isolated nucleic acid or protein, 
or fragment thereof, in which substantially all contaminants (i.e. substances that differ 
from said specific molecule) have been separated from said nucleic acid or protein. 

In a host cell, an "endogenous" sequence as used herein means a nucleic acid sequence 
that is naturally-occurring and resides within the host genome. 

In a host cell, an "exogenous" sequence as used herein means an isolated nucleic acid 
sequence that is introduced into the host cell, using any one of a . variety of introduction 
methods, such'as transfection, dectroporation, cationic lipid or salt treatment methods. 

"Knockout mutant" or "knockout mutation" as used herein refers to an in vitro engineered 
dismption of a region of endogenous chromosomal DNA (e.g., disruption of the genome), 
typically within a protein coding region. A knockout mutation can be generated by 
inserting an exogenous DNA sequence into the homologous endogenous sequence: A 
knockout mutation occurring in a protein coding region is expected to disrupt normal 
expression of the protein coding region. This usually leads to loss of the function 
provided by the protein. 

In order that the invention herein described may be more fully understood, the following 
description is set forth. 
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A) MOLECULES OF THE INVENTION 

1.) CEG NUCLEIC ACID MOLECULES 

The present invention provides isolated and recombinant ceg nucleic acid molecules and 
fragments thereof, and related molecules, such as sequences complementary to ceg 
sequences or a portion thereof, and those that hybridize to the nucleic acid molecules of 
* the invention. 

"a •* 

■ 

The ceg polynucleotide sequences, also referred to herein as nucleic acid molecules of the 
invention, are preferably in isolated form, including DNA, RNA, DNA/RNA hybrids, and 
related molecules, and fragments thereof. Specifically contemplated are genomic DNA, 
ribozymes, and ahtisense molecules, as well as nucleic acid molecules based on an 
alternative backbone or including alternative bases, whether derived firam natural sources or. 
synthesized. Embodiments of particular ceg polynucleotide and amino acid sequences 
include, but are not limited to, the sequences described in Tables I and n (e.g., SEQ ID 
N0S:1-113, 114-226 and SEQ ID NOS: 227-339, 340-452, respectively). The ceg 
polynucleotide and amino acid sequences were designated cfe which stands for CEG For 
Expression. 

Biological samples of the 2CFE nucleic acid molecules (e.g., SEQ ID NOS: 227-331) 
were deposited on December 20, 2000 with the American Type Culture Collection 
(ATCC), 10801. University Blvd., Manassas, VA 201 10-2209. 



TABLE I 



CFE Designation 


SEQ. ID NO. 
- (Nucleotide) 


SEQ. ID NO. 
(Polypeptide) 


POLARITY 


CFE 1 


1 


114 


+ 


CFE 2 


2 


115 




CFE 3 


3 


116 




CFE 4 


4 


117 


+ 


CFE 5 


5 


118 




1 • CFE6 


6 


119 


+ 
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CF£ Designation 


SEQ. ID NO. SEQ. ID NO. POLARITY 
(Nucleotide) (Polypeptide) 


CFE7 


7 


120 


- 


CFE 8 


1 ^ 


121 


■+ 


CFE 9 


9 


122 


+ 


CFE 10 


10 


123 




CFE 11 


11 


124 




CFE 12 


12 


125 


+ 


CFE 13 


13 


126 


- 


CFE 14 


14 


127 


+ 


CFE 15 


15 


128 - 


- 


4*11 II 1 ■< ✓* 
CFE 16 


16 


129 


- 


CFE 17 

■ 


17 . 


130 


- 


111 4 

CFE 19 


18 


131 


+ 


CFE 21 


19 


132 




CFE 24 


20 


133 


- 


CFE 25 




134 




CFE 26 


22 ■ 


135 


- 


CFE 27 


23 


136 


+ 


CFE 28 


24 


137 


- 


CFE 29 


25 


138 


- 


CFE 30 1 


26 


139 


- 


CFE 31 


27 


140 


+ 


CFE 32 


28 


141 




CFE 33 


29 


142 


- 


CFE 34 1 


30 


143 


+ 


CFE 35 1 


31 


144 


+ 


CFE 36 1 


32 


145 


+ 


CFE 37 1 


33 


146 




CFE 38 1 


34 


147 






35 


148 




CFE 40 


36 


149 




CFE 41 


37 


150 




CFE 42 


38 


151 




CFE 43 . 1 


39 


152 




CFE '44 ■ 


40 


153 




CFE 45 


41 


154 




CFE46 


42 


155 
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CFE Designatioa 


SEQ. ID NO. SEQ. ID NO. POLARITY 
(Nucleotide) (Polypeptide) 


CFE 47 


43 


156 


- 


CFE 48 


44 


157 




CFE 49 


45. 


158 . 




CFE 50 


46 


159 


+ 


CFE 51 


■ 47 


160 




CFE 52 


48 


. 161 


- 


CFE 53 


49 


162 


+ 


CFE 54 


50 


163 


+ 


CFE 55 


. 51 


164 


+ 


CFE 56 


52 


165 


+ 


CFE 57 


53 


166 


+ 


CFE 58 


54 


167 


+ 


CFE 59 


55 


168 


- 


CFE 60 


56 


169 . 


+ 


CFE 61 


57 


170 




CFE 62 


. 58 


171 




CFE 63 


59 


172 




CFE 64 


60 


173 


+ 


CFE 65 


61 


174 ■ 


+ 


CFE 66 


62 


175 




CFE 67 


63 


176 


+ 


CFE 68 


64 


177 


- 


CFE 69 


65 


178 




CFE 70 


66 


179 

* 


+ 


CFE 71 


67 


180 




CFE 72 


68 


181 


• > 


CFE 73 


69 


182 


+ 


. CFE 74 


70 


183 




CFE 75 


71 


184 




CFE 76 


72 


.185 




CFE 77 


73 


186 




CFE 78 


74 


187 




CFE 79 


75 


188 




CFE 80 


76 


189 




CFE 81 


77 


190 


+ 


CFE 82 


78 


191 
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CFE Designation 


SEQ. ID NO. 
(Nucleotide) 


SEQ. ID NO. POLARITY 
(Polypeptide) 


CFE 83 


79 


1 192 




/Tr.*T? OA 

CFE 84 


OA 

80 


193 


- 


Crb 85 


O 1 

81 


194 




Crb 86 • 


82 


195 


- 


CFE 87 


83 


196 


- 


CFE 88 


84 


197 ; 


- 


CFE 89 


85 


1 198 




CFE 90 


86 


199 


+ 


CFE 91 


OT 

87 


200 


- 


CFE 92 


88 


201 


- 


lilt /AO 

CFE 93 


89 

• 


202 




CFE 94 


90 


203 


+ 


CFE 95 


91 


204 


+ 


CFE 96 


92 


■ 205 • . 


+ 


CFE 97 


93 


1 206 


- 


CFE 98 


94 


207 




CFE 99 


- 95 


208 


+ 


/^T?T? 1 A/A 

CFE 100. 


• 


209 




CrE 101 


AT 

97 


210 


- 


Crb 102 


no 1 

98 1 


211 




CrE lOJ 


AA 1 

99 1 


. 212 


- 


CrE 104 


1 AA 1 

100 1 


213 


+ 


• OT?r? 1 AC 

CrE 105 


1 A1 

101 1 


214 


- 


/^tJT? 1 A/C 

CrE lOo 

* 


1 AO 1 

102 


215 

• 


- 


CFE 107 


103 1 


216 


- 


/^■PT? 1 AO 

CrE 108 


104 


217 


+ 


/^CT? 1 AA 

CrE 109 


105 I 


218 








219 




CFE 111 


. 107 


• 220 




CFE 112 


108 


221 




CFE 113. 


109 


222 


• 


CFE 114 


110 . 


223 




CFE 115 


Ill 


224 


« 


CFE 116 


112 


225 




CFE 117 


113 1 


226 
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CFE Designation 


SEQ. ID NO. 


SEQ. ro NO. 


FIGURE 




(Nucleotide) 


(Polypeptide) 




2CFE1 




* 


26 


2CFE2 


• 




27 


2CFE3 






28 


2CFE 4 






29 


2CFE 5 




• 


30 


2CFE6 






31 


2CFE7 






32 


2CFE8 






33 


2CFE9 






34 


2CFE 10 




■ 


35 


2CFE 1 1 


■ 




36 


2CFE 12 






37 


2CFE 13 






38 


2CFE 14 






39 


2CFE 15. 






40 


2CFE 16 


• 




41 


2CFE 17 




• 


42 


2CFE 19 






43 • • 


2CFE21 


• 




44 


2CFE 24 






45 


2CFE 25 






46 


2CFE 26 






47 


2CFE 27 


* 




48 


2CFE 28 






49 


• 2CFE29 




• 


50 


2CFE 30 


• 


• 


51 


2CFE 3 1 


• 


• 


52 


2Crc 32 






53 


2CFE 33 






54 


2CFE 34 






55 


2CFE 35 






56 


2CFE 36 






57 


2CFE 37 






58 


2CFE 38 






59 


2CFE 39 






60 
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CFE.D0signation 


1 SEQ. ID NO. 


SEQ. m NO. 


FIGURE 




1 (Nucleotide) ■ 


(Polypeptide) 




2CFE 40 






61 


2CFE 41 






62 


2CFE 42 




■ 


63 


2CFE 43 


1 




64 


2CFE44 


1 


• 


65 


2CFE 45 




• 


66 


2 CPE 46 


1 




67 


2CFE 47 




■ 


68 


2CFE 48 






69 


2CFE 49 


1 




70 


2CFE 50 






71 


2CFE51 




• 


72 


2CFE 52 






73 


2CFE 53 






74 


2CFE 54 1 




75 ■ 


2CFE 55 






76 


2CFE 56 






77 


2CFE 57 






78 


2CFE 58 






. 79 


2CFE 59 






80 


2CFE 60 






81 


2CFE 61 






82 


• 2CFE 62 






83 


2CFE 64 






84 

• 


2CFE65 






85 


2CFE 66 






86 


2CFE 67 




• 


87 


2CFE 68 






88 


2CFE 69 

1 






89 


2CFE 70 






90 


2CFE71 






91 


2CFE72 . 






92 


2CFE 75 






93 


2CFE76 . 


• 




94 


2CFE 78 


• 




95 


2CFE 79 






96 


2CFE 80 

1 






97 
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CFE Designation 


I SEQ. ID NO. 


1 SEQ. ID NO. 


1 FIGURE 




1 (Nucleotide) 


1 (Polypeptide) 




2CFE 81 






98 


2CFE 82 




1 


99 


2CFE83 






100 


2CFE 84 






101 


2CFE85 






102 


2CFE 86 1 


1 
1 


103 


2CFH 87 






104 


2CFE 88 






105 


2CFE 89 






106 


2CFE 90 




1 


107 


2CFE 91 






108 


2CFE 92 1 




109 , 


2CFE94 




110 


2CFE 95 1 




11.1 


2CFE 96 1 




112 


2CFE 97 1 J 




113 


2CFE 99 1 




114 


2CFE 101 






115 


2CFE 102 






116 


2CFE 103 






117 


2CFE 104 






118 


2CFE 105 




1 119 ■ 


2CFE 106 




1 120 


2CFE 107 






121 


2CFE 108 






122 


2CFE 109 1 




w 1 


123 


2CFE 111 

« 1 




124 


2CFE112 




1 125 


2CFE113 




1 126 


2CFE114 






127 


2CFE115. 






128 


2CFE116 






129 


2CFE117 






130 
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a) Variant c^^ Nucleotide Sequences 

The present invention also provides nucleic acid molecules having a nucleotide sequence 
substantially identical or similar to the ceg sequences (SEQ D NOS: 1-113, 227-331) 
disclosed herein. 

The present invention provides nucleotide sequences which are similar to SEQ ID 
NOS: 1-1 13 and/or SEQ ID NOS:227-331.. The present invention provides nucleotide 
sequences which vary from SEQ ID NOS .1-1 13 or 227-331 by a range of about 1% to 
about 70%. 

The present invention encompasses variations in polynucleotide sequences resulting from 
mutations and/or from transfer of genetic material from one cell to another (e.g., 
horizontal gene transfer or horizontal gene exchange). 

The present invention also provides for variants of the polynucleotide ceg sequences 
disclosed herein, including variants isolated from naturally-occurring sources, those 
generated by recombinant DNA technology or other in vitro synthesis methodologies 
(e.g., PGR). The variant polynucleotide sequences of the invention encode polypeptides 
that exhibit the biological activity of naturally-occurrmg CEG polypeptides, such as 
activity required for bacterial cell viability. 

In general, for example, a variant of ceg polynucleotide sequences may encode a 
polypeptide that differs by one or more amino acid substitutions. The variant may have 
conservative changes, wherein a substituted amino acid has similar structural or chemical 
properties, eg, replacement of leucine with isoleucine. 

A polynucleotide sequence can encode conservative amino acid substitutions without 
altering either the conformation or the^ftmction of the polypeptide. Such changes include 
substituting any of isoleucine (I), valine (V), and leucine (L) for any other of these 
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hydrophobic amino acids; aspartic acid QD) for glutamic acid (E) and vice versa; 
glutamine (Q) for asparagine (N) and vice versa; and serine (S) for threonine (T) and vice 
versa. Other substitutions can also be considered conservative, depending on the 
environment of the particular ammo acid and its role in the three-dimensional structure of 
5 the protein. For example, glycine (G) and alanine (A) can frequently be interchangeable, 
as can alanine (A) and vaUne (V). Methionine (M), which is relatively hydrophobic, can 
frequently be interchanged with leucine and isoleucine, and sometimes with valine. 
Lysine (K) and arginine (R) are frequently interchangeable in locations in which the 
significant feature of the amino acid residue is its charge and the differing pK's of these 
10 two ammo acid residues are not significant. Still other changes can be considered 
"conservative" in particular environments, 

A variant may also have nonconservative changes, eg, replacement of a glycine with a 
tryptophan. Other variations may also include amino acid deletions or insertions, or both. 
15 Guidance in determining which and how many amino acid residues may be substituted, 
inserted or deleted without abolishing biological or immunological activity may be found 
using computer programs well known m the art, for example, DNASTAR software. 

Another type of ceg sequence variant includes naturally-occurring allelic variants of ceg 
20 which share significant similarity (e.g., between about -30- 99%) to the disclosed CEG 
polypeptide sequence. Allelic variants of the ceg sequences can encode conservative or 
non-conservative amino acid substitutions of the CEG polypeptide sequence herem 
described. 

25 An example of allelic variants of ceg are mutant alleles of ceg polynucleotide sequences that 
encode a polypeptide having one or more changes in the polypeptide sequence, such as 
amino acid substitutions, deletions, insertions, flame shifts, or truncations. The mutant 
alleles of ceg may or may not encode a CEG polypeptide having the same biological 
fimctions as wild-type CEG proteins. 

30 

« 

* • 
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Variations in the bacterial genomic sequences can also arise from transfer of genetic 
material to another bacterial cell. The transfer of gene sequences can occur intraspecies 
or interspecies. Gene transfer can occur between bacterial cells which are members of 
the same or different populations. A population includes, but is not limited to, a serotjrpe 
isolate, a cUnical isolate, a naturally-occurring isolate, a strain, and a species. The 
transfer of genetic material can occur between cells within a population; for example 
transfer between serotype A to serotype A, or between S. pneumoniae and S. 
pneumoniae. The transfer of genetic material can occur between cells of different 
populations; for example, between serotype A to serotype B oi S, pneumoniae and S. 
mutans. 

Gene transfer can give rise to mutant or polymorphic variant genes sequences. In rare 
cases, gene transfer introduces new gene sequences that confer a new phenotype, such as 
antibiotic resistance. The transfer of genetic material includes transfer of large regions of 
genomic sequences which include partial gene sequences, whole single gene sequences, 
or multiple gene sequences. This mode of transfer can give rise to replacement of native 
whole gene sequences or introduction of new sequences in the recipient cell. This mode 
of transfer gives rise to mosaic gene sequences in the recipient cell. 

The variation of genomic sequences resulting from gene transfer can be examined using 
molecular techniques, including: multilocus enzyme electrophoresis (Selander. R. K., et 
al., 1986 Appl Environ, Microbiol 51:837-884); and restriction endonuclease cleav^e 
electrophoretic profiUng (Coffey, T. J., et al,, 1991 Mol Microbio, 5:2255-2260); pulse- 
field gel electrophoresis fmgerprinting (Bygraves, J. A, and Maiden, M. C. J. 1 992 J. 
Gen. Microbiol 138:523-531); and ribotyping (StuU, T. L., et al., 1988 J, Infect, Dis. 
157:280-286). The degree of variation can vary greatly, and ranges from little or no 
variation as exemplified by gene sequences of E. coli (Caugant, d. A., et al., 1981 
Genetics 98:467-490; Whittam. T. S., et al., 1983 Mol. Biol. Evol. 1:67-83; Souza, V., et 
al.. 1992 Proc. Natl. Acad Sci. USA 89:8389-8393) and Salmonella (Selander, R. K., et 
al., 1990 Infect. Immun. 58:2262-2275; Selander, R.K. and Smith, R H. 1990 Rev. Med. 
Microbiol. 1:219-228; Smith, J. M., et al., 1993 Proc. Natl. Acad. Sci. USA 90:4384- 
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4388), to extensive gene transfer in Neisseria gonorrhoeae (Smith, J. M., et al., 1993 
Proc. Natl Acad Set USA 90:4384-4388). 

Gene transfer can be examined between various isolates of a particular microbial species 
5 which are antibiotic-sensitive or antibiotic-resistent (Coffey, T. J., et al., 1991 Molec. 
Microbiol 5:2255-2260). Molecular biology techniques can be utilized to study the 
degree of transfer between populations, such as, for example, the degree of gene transfer 
between serotypes, isolates, strains,*or species . The degree of transfer can be examined 
by comparing, for example, the penicillin binding proteins and numerous different loci 
1 0 which encode metabolic enzymes oir capsular biosynthesis enzymes. 

For example, intra-species, inter-serotype, gene transfer is possible (Coffey, T. J., et al., 
1991 supra). Additionally, intraspecies gene transfer in S, pneumoniae (Coffey, T. J., et 
al., 1998 Mol Microbiol 27:73-83), Vibrio cholerae (Bik, E. M., et al., 1995 EMBO J. 
15 14:209-216), and Haemophilus influenzae (Kroll, J. S. and Moxon, E. R. 1990 J. 
Bacterial 172: 1374-1379) are possible. 

Interspecies gene transfer is also possible (Dowson, C. G., et al., 1989 Proc. Natl. Acad. 
Sci. USA 86:8842-8846; Laibl, G., et al., 1991 Mol. Microbiol. 5:1993-2002; Bourgoin, 
20 F.,etal., 1999 Gene 233:151-161). 

Variant gene sequences arising from gene transfer can be continually generated in 
transformable bacteria (e.g., transformation competent), such as S. pneumoniae. For 
Kcample, the worldwide spread of varying degrees of antibiotic resistance has. been 

25 documented and reviewed (Dowson, C. G., et al., 1994 Trends Microbiol. 2:361-366- 
Spratt, B. G. in Bacterial Cell Wall. eds Ghuysen J-M. and Hakenbeck, R. 1994 pp. 517- 
534; and reviewed in Maiden, M. C. J. 1998 Clinic, Infect Dis. 27 (Supplement 1) S12- 
S20). For example, variant gene sequence arising from gene transfer can be tracked 
using a marker gene such as the gene which encodes the penicillin binding protein 

30 (Barcus, V. A., et al., 1995 FEMS Microbiol Lett 126:299-303). At the nucleotide level, 
gene sequences encoding the penicillin binding proteins in susceptible and resistant 
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Strains differ by about 14% to 23% (Hakenbeck, R. 1995 Biochem. Pharmacol. 50:1121- 
1 127; Spratt, B. G. in Bacterial Cell Wall, eds Ghuysen J-M. and Hakenbeck, R, 1994 pp. 
517-534; Spratt, B. G., et al., 1991 Neisseria meningitidis md Streptococcus pneumoniae 
eds. Camisi, J., et al., pp. 73-83; Coffey, T. J., et al., 1995 Micro. Drug Resist. 1:29-34). 



The ceg nucleotide sequences can be isolated from various species of Streptococcus 
including Streptococcus pneumoniae. AdditionaUy, the ceg sequences can be isolated from 
other Steptococcal species, including S mutans, S pyogenes; and S thermophila. The ceg 
polynucleotide sequences can also be isolated from strains of other bacterial genera 
10 including, biit not limited to. Streptococcus, Escherichia. Bacillus. Pseudomonas. 
Yersinia, Salmonella, and Haemophilus. 

The present invention additionally provides isolated codon-usage variants that differ from 
the disclosed ceg nucleotide sequences, yet do not alter the predicted CEG polypeptide 
15 sequence or function. The codon-usage variants may be generated by recombinant DNA 
technology. Codons may be selected to optimize the level of production of the ceg 
transcript or CEG polypeptide in a particular prokaryotic or eukaryotic expression host, 
in accordance with the frequency of codon utilized by the host cell. Alternative reasons 
for altering the nucleotide sequence encoding a CEG polypeptide include the production 
20 of RNA transcripts having more desirable properties, such as an extended half-life or 
increased stability. A multitude of variant ceg nucleotide sequences that encode the 
respective CEG polypeptide may be isolated, as a result of the degeneracy of the genetic 
code. Accordingly, the present invention contemplates selecting every possible triplet 
codon to generate every possible combination of nucleotide sequences that encode the 
15 disclosed CEG polypeptides. This particular embodiment provides isolated nucleotide 
sequences that vary from the sequences as described in SEQ ID NOs.: 1-1 13 or 227-33 1, 
such that each variant nucleotide sequence encodes a polypeptide having sequence 
identity with the amino acid sequences, as described in SEQ DD NOs.: 114-226 or 332- 
436, respectively. 
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b) Complementaiy Sequences 

The present invention includes polynucleotide sequences that are complementary to the 
5 sequences disclosed herein. The term "complementary" as used herein refers to the 
capacity of purine and/or pyrimidine nucleotides to associate through hydrogen bonding 
to form doiible stranded nucleic acid molecules. The following base pairs are related by 
complementarity: guanine and cytosine; adenine and thymine; and adenine and uracil. 
Complementary applies to all base pairs comprising at least two smgle-stranded nucleic 
10 acid molecules. 

c) Sequences Capable of Hybridizing 

. Another embodiment provides nucleic acid molecules that will hybridize to ceg 
1 5 sequences under hybridization conditions. It is readily apparent to one skilled in the art 
that the strmgency of the hybridization condition selected will depend upon the 
characteristics of the nudleic acid molecule to be hybridized, such as, the length, the 
degree of complementarify (e.g., exact or non-exact complementarity), the percent A/T 
content, and the objective of tiie hybridization experiment. 

20 

The hybridization procedure may by performed in lovy stringency hybridization 
conditions. Low stringency hybridization conditions will permit hybridization between 
two nucleic acid molecules that differ from exact complementarity by about 25% to 70%. 
Hybridization under standard high stringency conditions will occur between two 
25 complementary nucleic , acid molecules (e.g., 100% exact complementarity) or two 
complementary nucleic acid molecules that differ from exact complementarity by about 
1% to about 70%. 

The high stringency hybridization conditions tiiat disfavor non-homologous base pairing 
30 are weU known in the art TypicaUy, high stringency hybridization conditions, includes 

> * 

but is not limited to, hybridizing at 50 »C to 65 'C in 5X SSPE, and washing at 50 »C to 
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65 °C in 0.5X SSPE. Typically, low stringency conditions, includes but is not limited to, 
hybridizing at 35 °C to 37 °C in 5X SSPE and 40% to 45% formamide and washing at 42 
^'C in 1-2X SSPE. The conditions arid formulas for high stringency hybridization 
methods are well known in the art and can be readily obtained in Molecular Cloning; A 
Laboratory Manual (T^^ edition, Sambrook, Fritch, and Maniatis 1989, Cold Spring 
Harbor Press) or m Short Protocols in Molecular Biology (Ausubel, F. M., et al., 1989, 
• John Wiley & Sons). 

d) Fragments of ceg Sequences 

The invention further provides nucleic acid molecules having fragthents of the ceg 
sequences, such as a portion of the ceg sequence (e.g., SEQ ID N0S:1-113, 227-331) 
disclosed herein. The size of die fragment will be determined by its intended use. For 
example, the length of the fragment to be used as a nucleic acid probe or PGR primer is 
chosen to obtain a relatively small number of false positives during probing or priming 
Alternatively, a fragment of the ceg sequence may be used to construct a recombinant fusion 
gene having a ceg sequence fused to a non-ceg sequence. 

The nucleic acid molecules, fragments thereof, and probes and primers of the present 
invention are useful for a variety of molecular biology techniques including, for example, 
hybridization screens of libraries, or detection and quantification of mRNA transcripts as 
a means foi: analysis of gene transcription and/or expression. Preferably, the probes and 
pruners are DNA. A probe or primer length of at least 15 base pairs is suggested by 
theoretical and practical considerations (Wallace, B. and Miyada,. G. 1987 
"Oligonucleotide Probes for the Screening of Recombinant DNA Libraries" in: Methods • 
in Enzymology, 152:432-442, Academic Press). Other lengths of fragments, probes, or 
primers are possible and routine to determine. 

- The probes and primers of this invention can be prepared by methods well known to 
those skilled in the art (Sambrbok, ef al. stipra). In a preferred .embodiment the probes 
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and primers are synthesized by chemical synthesis methods (ed: Gait, M. J. 1984 
Oligonucleotide Synthesis, IRL Press, Oxford, England). 

« 

One embodiment of the present invention provides nucleic acid primers that are 
5 complementary to ceg sequences, >yhich allow the specific amplification of nucleic acid 
molecules of the invention or of any specific parts thereof Another embodiment 
provides nucleic acid probes that are complementary for selectively or specifically 
hybridizing to the ceg sequences or to any part thereof 

10 e) Derivative Nucleic Acid Molecules 

The nucleic acid molecules of the invention include peptide nucleic acids (PNAs), or 
derivative molecules such as phosphorothioate, phosphotriester, phosphoramidate, and 
methylphosphonate, that specifically bind to single-stranded DNA or RNA in a base pair- 
15 dependent manner (Zamecnik, P, C, et aL, 1978 Proc. Natl Acad Sci. 75:280284; 
Goodchild, P, C, etal., 1986 Proc. Natl, Acad Sci, 83:4143-4146). 

i 

PNA molecules comprise a nucleic acid oUgomer to which an amino acid residue such as 
lysine, and an amino group have been added. These small molecules, also designated 

20 anti-gene agents, stop transcript elongation by binding to their complementary (template) 
strand of nucleic acid (Nielsen, P. E., et al., 1993 Anticancer Drug Des 8:53-63). For 
example, reviews of methods for synthesis of DNA, RNA, and their analogues can be 
found inv Oligonucleotides and Analogues, eds. F. Eckstein, 1991, IRL Press, New York; 
Oligonucleotide Synthesis, ed. M. J. Gait^ 1984, IRL Press, Oxford, England. 

25 Additionally, methods for antisense RNA technology are described in U. S. patents 
5,194,428 and 5,1 10,802. A skilled artisan can readily obtain these classes of nucleic acid 
molecules using the herein described ceg polynucleotide sequences, see for example 
Innovative and Perspectives in Solid Phase Synthesis (1992) Eghohn, et al. pp 325-328 or 
U. S. Patent No. 5,539,082. 

30 
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f) RNA Molecules 

The present invention provides RNA molecules that encode the predicted ceg gene 
products. In particular, the RNA molecules of the invention may be isolated full-length 
or partial mRNA molecules or RNA oligomers that encode CEG gene products. The 
RNA molecules of the invention include the nucleotide sequences encoding all or 
portions of CEGs. 

The RNA molecules of the invention also mclude antisense RNA molecules, peptide 
nucleic acids. (PNAs), or non-nucleic acid molecules such as phosphorothioate 
derivatives, that specifically bind to the. sense strand of DNA or RNA in a base pair- 
dependent manner. A skilled artisan can readily obtain these classes of nucleic acid 
molecules using the herein described ceg sequences. 

g) Labeled Nucleic Acid Molecules 

• ' - . 

The nucleic acid molecules having ceg sequences can be labeled with a detectable 
marker. Examples of a detectable marker include, but are not limited to, a radioisotope, a 
fluorescent compound, a biolummescent compoimd, a chemiluminescent compound, a 
metal chelator or an enzyme. Technologies for generating labeled DNA and RNA probes 
are well known in the art (See e.g. Sambrook et al., supra). 

2.) RECOMBINANT NUCLEIC ACID MOLECULES 

Also provided are recombinant nucleic acid molecules, such as recombinant DNA molecules 
(rDNAs) that comprise ceg sequences or firagments thereof. As used herein, a recombinant 
DNA molecule is a DNA molecule that has been subjected to molecular manipulation in vitro. 
Methods for generating rDNA molecules are well known m the art, for example, see Sambrook 
et al., Molecular Cloning ( 1 989), supra. 

« 

< 
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a) Vectors 

The nucleic acid molecules of the invention may be recombinant molecules each 
comprising the sequence, or portions thereof, of a ceg sequence linked to a non-ceg 
5 sequence. For example, the ceg sequence may be fused operatively to a vector to 
generate a recombinant molecule. The term vector includes, but is not limited to, 
plasmids, cosmids, and phagemids. A. preferred vector mcludes an autonomously 
replicatmg vector comprising a replicon that directs the replication of the rDNA within the 
appropriate host cell. The preferred vectors can also include an expression control 
0 element, such as a promoter sequence, vydiich enables transcription of the inserted ceg 
sequences and can be used for regulating the expression (e.g., transcription and/or 
translation), of an operably Unked ceg sequence in an appropriate host cell such as 
Escherichia coll Expression control elements are known in the art and include, but are not 
limited to, inducible promoters, constitutive promoters, secretion signals, enhancers, 
15 transcription terminators, and other transcriptional regulatory elements. Other expression 
control elements that are involved in translation are known in the art, and include the Shine- 
Dalgamo sequence, and ioitiation and termination codons. The preferred vector also 
includes at least one selectable marker gene that encodes a gene product that confers drug 
resistance such as resistance to ampicillin or tetracyline. The vector also comprises 
rnultiple endonuclease restriction sites that enable convenient insertion of .exogenous 
DNA sequences. 

The preferred vectors for generating ceg transcripts and/or the encoded CEG polypeptides 
are expression vectors which are compatible with prokaryotic host cells. Prokaryotic cell 
expression vectors are well known in the art and are available from several commercial 
sources. For example, a pET vectors (e.g., pET-21, Novagen Corp.) may be used to 
express CEG polypeptides in bacterial host cells. 
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b) Recombinant Vectors for Integration 

The present invention provides recombinant vectors that may be used to integrate 
exogenously provided sequences into the genome of a host ceU. The recombinant 
integration vectors of the present invention include a gene that encodes a selectable 
marker and ceg sequences; or fragments thereof. The integration vectors are used to 

« 

integrate the ceg sequence into a target gene sequence that resides within the bacterial 
host genome (e.g., endogenous sequence), thereby disrupting the function of the target 
gene sequence within the bacterial cells. These mtegration vectors may be used in a gene 
disruption assay to screen candidate ceg nucleotide sequences, in order to identify the 
candidate sequences that encode a gene product that is required for bacterial cell viability. 

Accordingly, these recombinant integration vectors include candidate ceg sequences that 
15 will be screened to determine if the candidate ceg sequences encode a gene product that 
is required for cell viability. The candidate ceg sequence that is included as part of the 
recombinant integration vector is the "exogenous" ceg sequence that is employed as the 
"disrupting" sequence in a gene disruption assay. The ceg sequence that resides within 
the host genome is the "endogenous" or "target" ceg sequence. 

20 

The integration event rarely occurs, for example, by non-homologous recombination m 
which a recombinant vector, that includes the exogenous ceg sequence, inserts the 
exogenous ceg sequence into a random location within the host genome. In a more 
preferred embodiment, the mtegration event inserts the exogenous ceg sequence into a 

25 specific target site withm the host genome. The targeted integration event can involve 
homologous recombination in which the integration vector, that includes the exogenous 
ceg sequence, mserts the exogenous ceg sequence mto its homologous target ceg 
sequence that resides within the host's genome (e.g., the endogenous ceg sequence) 
(Figure 1). Further, the exogenous ceg sequence can be used as a disrupting sequence 

30 whereby the homologous recombination event integrates the exogenous ceg sequence 
into the endogenous target ceg sequence resulting in disruption of the fimction of the 
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endogenous ceg sequence. For example, disrupting the function of the endogenous ceg 
sequence may result in the loss of bacterial cell viability. 

4 

An example of a recombinant vector that can be used as an integration vector in S. 
5 pneumoniae is the pEVP-3 vector (Jean-Piene Claverys, et al. 1995 Gene 164: 123-128). 
The pEVP-3 vector integrates an exogenous sequence by homologous recombination 
involving a Campbell-type event (S. Adhya and A. Campbell 1970 J, MoL Biol 50:481- 
490). The pEVP-3 vector includes a replicon that functions only in gram-negative 
bacteria, such as E, coll Therefore, the pEVP-3 vector cannot replicate in S. 
10 . pneumoniae. This vector also contains multiple cloning sites, and confers resistance to 
chloramphenicol in both a gram-negative and gram-positive bacteria, such as S. 
pneumoniae. 

c) Fusion Gene Sequences 

15 

A fusion ceg gene is another example of a recombinant molecule of the invention. A fusion 
gene includes a ceg sequence operatively fiised (e.g., linked) to a non-ceg sequence such as, 
for example, a tag sequence to facilitate isolation and/or purification of the expressed 
CEG gene product (KroU, D.J., et al., 1993 DNA Ceil Biol 12:441-53). 

20 

Alternatively, a recombinant fusion molecule has a ceg sequence of the invention fiised to 
a ceg sequence isolated from a different microbial source. For example, the disclosed ceg 
sequences isolated from S, pneumoniae can be fused to a ceg sequence isolated from a 
different bacterial species. 
25 . 

3.) CEG PROTEmS AND POLYPEPTIDE MOLECULES 

The invention additionally provides CEG proteins and peptide fragments thereof that are 
isolated or sjibstantially purified. Embodiments of particular CEG amino acid sequences 
30 are disclosed in Tables I and D (SEQ ID NOS:114-226 and SEQ ID NOS:332-436, 
respectively). 
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The present invention also includes polypeptides having sequence variations from the 
predicted CEG polypeptide sequences disclosed herein, including mutant variants, 
conservative substitution variants, and similar CEG polypeptides from other prokaryotic 
organisms. For convenience, such proteins are referred to herein as "CEG proteins", 

■ 

"CEG polypeptides", or "proteins of the invention". 

As used herein, CEG protein refers to a polypeptide having amino acid sequence identity or 
similarity to any one of the predicted ammo acid sequences, as provided in SEQ ID NO.: 
1 14-226 or 332-436. The variant CEG polypeptides can be allelic forms of CEG, such as 
mutant forms of CEG polypeptides. The present mvention also provides conservative 
substitution-mutants of the CEG proteins that maintain functional activity of wild-type CEG 
(e.g., the CEG polypepti.de is required for bacterial cell viabiUty). 

The CEG protein may be isolated from any source whether natural, synthetic, semi- 
synthetic, or recombmant. As used herein, "natural" refers to a polypeptide which is 
found in nature. Accordingly, the CEG proteins may be isolated from a prokaryotic 
organism, such as a bacterial strain including, but not limited to, Streptococcus, 
Escherichia, Bacillus, Pseudomonas, Yersinia, Salmonella, and Streptomyces, The CEG 
proteins of the invention, and fragments thereof, can also be generated by recombinant 
methods or chemical synthesis methods. 

The CEG polypeptides of the invention are essential for the viability of a bacterial cell. 
Further, the CEG polypeptides can exhibit at least any one of the following functions: a 
pantothenate kinase, a Holliday Junction branch migration protein, a single stranded • 
DNA binding protein, a phosphoglucosamine mutase, an acetyltransferase, an 
uridylyltrarisferase, a malonyl CoensymeArACP transcylase, a 3-oxoacyl-ACP synthase 
II, a 3-oxoacyl-ACP reductase, a phosphomethylpyrimidine (HMP-P) kinase, a GTP 
binding protein, a ATP binding protein, or a 4-aminoimidazole carboxylase. Putative 
functions can include, but are not limited to, sugar transferase, techoic acid biosynthesis, 
ribosome recycling fector, response regulator, nicotinate phosphoribosyltransferase, 

■ 
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nitropropane dioxygenase, (3R)-hydxoxymyristol acyl carrier protein dehydrase, sugar 
dehydrogenase, murein biosynthesis, cobalimin biosynthesis, ABC transporter, tRNA 
modification en2yme, aiylsulfatase, 16S processing enzyme, tRNA methyl transferase, 
elongation factor P, signal recognition particle, protein export, undecaprjenol kinase, SRP 
docking domain, diacyl glycerol kinase, dihydopicilinate reductase, HU-DNA binding 
protein, thiamine biosynthase, GreA transcription elongation factor, dTDP-L-rhamnose 
synthase, ATP-binding motif, ribose-5-p-3-epimerase-like activity, GTP 
pyrophosphokinase, acetyl-CoA carboxylase, 0-sialoglycoprotein endopeptidase, 
glucosamine-fructose-6-phosphase aminotransferase, Strpn adhesion-associated ABC- 
permease, GTP pyrophosphokinase RelA, IMP dehydrogenase, DNA gyrase subunit B, 
acetyl-CoA carboxylase subunit AccD, phosphoglycerol kinase, acetyl-CoA carboxylase 
carbonyl transferase,, phosphopanthetheine adenylyltransferase, oligopeptide transport 
permease subunit, translocation protein, perM permease, DNA pol III gamma and tau 
subunits, DNA pol III delta subunit, signal peptidase I, acetyl-coA carboxylase biotin 
carboxyl carrier protein, protein chain release factor- 1, replicative DNA helicase, 
topoisomerase, pentapeptide-transferase, elongation factor G, spore coat polysaccharide 
biosynthesis protein C, protein release factor B, DNA polymerase HI alpha subunit, 
phosphoprotein phosphatase, chaparonin, UDP-N-acetylmuramoylalanyl-D-glutamate-2, 
6-diaminopimelate ligase, techuronic acid biosynthesis, UDP-glucose lipid carrier 
transferase, transcription termination factor, chromosome segregation factor, amino acid 
biosynthesis, HMG-CoA reductase, hypoxanthine-guanine phosphoribosyltransferase. 

a) MODULATORS OF CEG POLYPEPTmES 

The invention provides compounds that modulate (e.g., activate or inhibit) the function of 
a CEG polypeptide. Such compounds can provide lead-compounds for developing drugs 
for diagnosing and/or treating conditions associated with bacterial infections. The 
modulator is a* compound that may alter the function of the CEG polypeptide, such as 
activating or inhibiting the function of a CEG polypeptide. For example, the compound 
can act as agonist, antagonist, partial agonist, partial antagonist, cytotoxic agents. 
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inhibitors of cell proliferation, and cell proliferation-promoting agents. The activity of 
the compound may be known, unknown or partially known. 

Suitable ligands include, but are not limited to, diazalactones, i\r-protected amino acid, 
azabicyclodiene, and alkaloids. 

An example of a diazalactone is: 




O- 

An example of an azabicyclodiene is: 
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Examples of alkaloids are: 
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5 B) METHODS FOR MAKING THE CEG PROTEINS AND POLYPEPTIDES 

Recombinant methods are preferred if a high yield is desired. Recombinant , methods 
involve expressing the cloned gene in a suitable host cell. For example, a host cell is 
introduced with an expression vector having the CEG sequence, then the host cell is 
10 cultured under conditions that permit in vivo production of the CEG protein. The 
recombinant vector can integrate the CEG sequence into the host genome. Alternatively, 
the CEG sequence can be maintained extra-chromosomally, as part of ah autonomously 
replicating vector. 

15 1. HOST-VECTOR SYSTEMS 

The invention further provides a host-vector system comprising the vector, plasmid, 
phagemid, or cosmid comprising a ceg nucleotide sequence, or a fragment thereof, 
introduced into a suitable host cell. The host-vector system can be used to produce the 
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CEG polypeptides encoded by the ceg nucleotide sequences. The host , cell can be 
prokaiyotic or eukaiyotic. Examples of suitable prokaryotic host cells include bacteria 
strains from genera such as Escherichia, Bacillus, Pseudomonas, Streptococcus, and 
Streptomyces. Examples of suitable eukaryotic host cells include a yeast cell, a plant cell, 
or an animal cell, such as a mammalian cell. A preferred embodiment provides a host- 
vector system comprising the pET21 vector having a ceg sequence introduced into an E. 
coli X.DE3 lysogen which is useful, for exaniple for the production of the CEG protein, 
herein designated .CFE polypeptides and CFE proteins. 

■ 

Introduction of the rDNA molecules of the present invention into an appropriate cell host is 
accomplished by well known methods that typically depend on the type of vector used and 
host system employed. For example, transformation of prokaryotic host cells by 
electroporation and salt treatment methods are typically employed, see for example, Cohen 
etal., 1972 Proc^carfiSd 054 69:2110; Maniatis,!., etal., 1989 Molecular Cloning, A 
Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 
Transformation of vertebrate cells with vectors containing rDNAs, electroporation, cationic 
lipid or salt treatment methods are typically employed, see, for example, Graham et al, 
1973 Virol 52:456; Wigler et al., 1979 Proc Natl Acad Sci USA 76:1373-76. 

Successfully transformed cells, i.e., cells that contain a rDNA molecule of the present 
invention, can be identified by well known techniques. For example, cells resxilting from 
the introduction of a rDNA of the present invention can be selected and cloned to produce 
single colonies. Cells from those colonies can be harvested, lysed and their DNA content 
examined for the presence of the rDNA using a method such as that described by Southem, 
JMol Biol (1975) 98:503, or Berent et al,, Biotech (1985) 3:208, or the proteins produced 
from the cell assayed via a biochemical assay or immunological method. 

Procaryotes are generally used as host cells for cloning and producing the products of 
exogenous DNA sequences. For example, the Escherichia coli K12 BL21 (A,DE3) 
(Novagen) is particularly usefid for expression of foreign proteins. Other strains of E. 
coli, and bacilli such as Bacillus subtilis, Enterobacteriaceae such as Salmonella 
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typhimUrium or Serratia marcescans^ various Pseudomonas, Streptococcus, and . 
Streptomyces species may also be employed as host ceils in cloning and expressing the 
recombinant proteins of this invention. 

In general terms, the production of recombinant CEG proteins may involve using a 
host/vector system, or other methods may be used. The host/vector system may employ the 
following steps. 

A nucleic acid molecule is obtained (hat encodes a CEG protein or a fragment thereof, such 
as any one of the polynucleotides disclosed in SEQ ID NOs.: 1-1 13 or 227-33 1. The CEG- 
encoding nucleic acid molecule is preferably inserted into an expression vector in operable 
linkage with suitable expression control sequences, to generate an expression vector 
including the CEG-encoding sequence. The expression vector is introduced into a suitable 
host, by standard transformation methods, and the resulting transformed host is cultured 
under conditions that allow the production of the CEG protein. For example, if expression 
of the CEG gene is under the control of an inducible promoter, then suitable growth 
conditions would include the appropriate mducer. The CEG protein (e.g., designated a 
CFE polypeptide or protein), so produced, is isolated from the growth medium or directly 
from the cells; recovery and purification of the protein may not be necessary in some 
instances where some impurities may be tolerated. A skilled artisan can roadily adapt an 
appropriate host/expression system known in the art for use with CEG-encoding sequences 
to produce a CEG protein (Cohen, et aL , supra; Maniatis et al,, supra). 

Host cells harboring the nucleic acids disclosed herein are also provided by the present 
invention. A preferred host is E. coli strain BL21(A.DE3) transfected or transformed with 
a vector comprising a nucleic acid of the present invention. The invention also provides a 
host cell capable of expressing the ceg sequenpes described herein. The preferred host 
cell is any strain of jE coli that can accommodate high level expression of an exogenously 
introduced gene. 
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The proteins of the present invention can also be made by chemical synthesis. The 
principles of solid phase chemical synthesis of polypeptides are well known in the art and 
may be found in general texts relating to this area (Dugas, H. and Penney, C. 1981 
Bioorganic Chemistry, pp 54-92, Springer-Verlag, New York). CEG polypeptides may 
be synthesized by solid-phase methodology utilizing an Applied Biosystems 430A 
peptide synthesizer (Applied Biosystems, Foster City, Calif.) and synthesis cycles 
supplied by Applied Biosystems. Protected amino acids, such as t-butoxycarbonyl- 
protected amino acids, and other reagents are commercially available firom many 
chemical supply houses. 

The polypeptides of the invention exhibit properties of a CEG protein, such as, for 
example, the ability to elicit the generation of antibodies that specifically bind an epitope 
associated with CEG polypeptides. Accordingly, the CEG polypeptide, or any 
ohgopeptide thereof, is capable of inducing a specific immune response in appropriate 
animals or cells and binding with specific antibodies. 

Q ANTIBODIES THAT RECOGNIZE AND BIND THE PROTEINS AND 
POLYPEPTIDES OF THE INVENTION 

The invention further provides antibodies (e.g., polyclonal, monoclonal, chimeric, 
humanized, and human antibodies) that bind a CEG polypeptide. The most preferred 
antibodies will selectively bind a CEG polypeptide and will not bind (or will bind weakly) a 
non-CEG polypeptide. Antibodies that are particularly contemplated include monoclonal 
and polyclonal antibodies, as well as fragments thereof (e.g., recombinant proteins) which 

« 

include the antigen binding domain and/or one or more complement determining regions of 
these antibodies. These antibodies can be from any source, for example, rabbit, sheep, rat, 
dog, cat, pig, horse, mouse, and human. 

The invention encompasses antibody firagments that specifically recognize a CEG 
polypeptide. As used herein, an antibody fragment is defined as at least a portion of the 
variable region of tiie immunoglobulin molecule that binds to its target, i.e., the antigen 
binding region. Some of the constant region of the immunoglobulin may be included. 
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As will be understood by those skilled in the art, the regions or epitopes of a CEG 
polypeptide to which an antibody is directed may vary with the intended application. For 

* 

example, antibodies intended for use in an immunoassay for the detection of membrane- 
S bound CEG proteins on viable bacterial cells should be directed to an accessible epitope 
on membranerbound CEG proteins. Antibodies that recognize other epitopies may be 
usefixl for the identification of CEG protein within damaged or dying cells, for the 

ft ■ 

detection of secreted CEG protein or fragments thereof 

10 Various methods for the preparation of antibodies are well known iQ the art. For example, 
antibodies may be prepared by immunizing a suitable mammalian host using a CEG protein, 
peptide, or fragment, in isolated or immunoconjugated form (Harlow, 1989 Antibodies^ Cold 
Spring Harbor Press, NY). In addition, fiision proteins comprising CEG polypeptides may 
also be used, such as a CEG protein/GST-fusion proteio. Cells expressing or overexpressing 

15 a CEG polypeptide may also be used for immunizations. Similarly, any cell engineered to 
express CEG protein may be used. This strategy may result in the production of monoclonal 
antibodies with enhanced capacities for recognizing endogenous CEG protein. 

The present invention contemplates chimeric antibodies that comprise a human and non- 
20 human immunoglobin portion. The antigen combining region (variable region) of a 
chimeric antibody can be derived from a prokaryotic source (e.g., bacteria) and the 
constant region of the chimeric antibody which confers biological effector function to the 
immunoglobulin can be derived from a eukaryotic source (e.g., human). The chimeric 
antibody should have the antigen binding specificity of the prokaryotic antibody 
25 molecule and the effector function conferred by the eukaiyotic antibody molecule. 

In one example, the procedure used to produce chimeric antibodies can involve the 
following steps: 

a) Identifying and cloning the correct immunoglobin gene segment encoding the 
30 antigen binding portion of the antibody molecule. This gene segment is known as 

the VDJ, variable, diversity and joining regions for heavy chains or VJ, variable, 
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joining regions for light chains or simply as the V or variable region. This gene 
regions may be in either the cDNA or genomic form; 

b) Cloning the gene segments encoding the constant region or desired part thereof; 

c) Ligating the variable region with the constant region so that the complete chimeric 
antibody is encoded in a form that can be transcribed and translated; 

d) Ligatmg this construct into a vector containing a selectable marker and gene control 
regions such as promoters, enhancers and poly(A) addition signals; 

e) Amplifying this construct in bacteria; 

f) Introducing this DNA into eukaryotic cells (transfection) most often mammalian 
lymphocytes; 

g) Selecting for cells expressing the selectable marker; 

h) Screening for cells expressing the desired chimeric antibody; and 

k) Testing the antibody for appropriate binding specificity and effector functions. 

Chimeric antibodies of several distinct antigen binding specificities have been produced 
by protocols well known in the art, including anti-TNP antibodies (Boulianne et al., 1984 
Nature 312:643); and anti-tumor antigen antibodies (Sahagan et al., 1986 J. ImmwioL 
137:1066). Likewise, several different effector functions have been achieved by linking 
. new sequences to those encoding the antigen binding region. Examples of these include 
enzymes (Neuberger et al,, 1984 Nature 312:604); immunoglobulin constant regions 
from another species and constant regions of another immunoglobulin chain (Sharon et 
al., 1984 Nature 309:364; Tan et al., 1985 1 Immunol 135:3565-3567). AdditionaUy, 
procedures for modifying antibody molecules and for producing chimeric antibody 
molecules using homologous recombination to target gene modification have been 
described (Fell et al., 1989 Proc, Natl Acad Sci. USA 86:8507-8511). 

The predicted amino £(cid sequence of a CEG protein may be used to select specific regions 
of the CEG protein for generating antibodies. For example, hydrophobicity and 
hydrophilicity analyses of a CEG polypeptide may be used to identify hydrophobic and 
hydrophilic regions in the CEG protein. Regions of the CEG protein that show 
immunogenic structure, as well as other regions and domains, can readily be identified using 
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various other methods known in the art, such as Chou-Fasman, Gamier-Robson , Kyte- 
Doolittle, Eisenberg, Kaiplus-Schult or Jameson- Wolf analysis. Fragments that include the 
immunogenic regions are particularly suited for generating specific classes of antibodies. 



S Methods for preparing a protein for use as an immunogen and for preparing immunogenic 
conjugates of a protein with a carrier such as BSA, KLH, or other carrier proteins are well 
known in the art. In some circumstances, direct conjugation using, for example, 
carbodiimide reagents may be used; in other instances linking reagents such as those 
supplied by Pierce Chemical Co., Rockford, IL, may be effective. Administration of a CEG 
1 0 iminunogen is conducted generally by injection over a suitable time period and with use of a 
suitable adjuvant, as is generally understood in the art During the immunization schedule, 
titers of antibodies can be taken to determine adequacy of polyclonal antibody formation. 

While the polyclonal antisera produced in this way may be satisfactory for some 
15 applications, for pharmaceutical compositions, monoclonal antibody preparations are 
preferred. Immortalized cell lines which secrete a desired monoclonal antibody may be 
prepared using the standard method of Kohler and Milstein {Nature 256: 495-497) or other 
techniques as described in Monoclonal Antibodies; A Manual of Techniques, CRC press, 
Inc., Boca Raton, Fla. '(1987) ed. Zola. The immortalized cell lines secreting the desired 
20 antibodies are screened by immunoassay in which the antigen is the CEG polypeptide 
having binding activity, or a fi:agment thereof. When the appropriate immortalized cell 
culture secreting the desired antibody is identified, the cells can be cultured either in vitro or 
by production in ascites fluid, 

25 The desired monoclonal antibodies are then recovered firom the culture supernatant or from 
the ascites supernatant. Fragments of the monoclonal antibodies of the invention or the 
polyclonal antisera (e.g., Fab, F(ab*)2, Fv fi-agments, fiision proteins) which contain the 
ijtnmunologically significant portion (i.e., a portion that recognizes and binds a CEG protein) 
can be used as antagonists, as well as the intact antibodies. Humanized antibodies directed 

30 against a CEG polypeptide are also useM. The advantage of using humanized antibodies is 
that they are less immunogenic in humans. As used herein, a humanized antibody is an 
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inununoglobulin molecule which is capable of binding to a CEG polypeptide and which 
comprises a FR region having substantially the amino acid sequence of a human 
immunoglobulin and a CDR having substantially the amino acid sequence of non-human 
immunoglobulin or a sequence engineered to bind a CEG proteirL Methods for humanizing 
5 murine and other non-human antibodies by substituting one or more of the non-human 
antibody CDRs for corresponding human antibody sequences are well known (Jones et al., 
1986 Mzft^re 321: 522-525; Riechmnanetal., 1988 Mi/i/re 332: 323-327; Verhoeyen et al., 
1988 Science 239: 1534-1536; Carter et al., 1993 Proc. Natl Acad Set USA 89: 4285; 
and Sims et al., 1993 J, Immunol, 151: 2296). 

10 

Use of immunologically reactive fragments, such as the Fab, ¥ab\ or F(ab')2 j&agments is 
often preferable, especially in a therapeutic context, as these fragments are generally less 
immunogenic than the whole immunoglobulia Further, bi-specific antibodies specific for 
two or more epitopes may be generated using methods generally known in the art. Further, 
15 antibody effector functions m^ be modified so as to enhance the therapeutic effect of the 
antibodies of the invention. For example, cysteine residues may be engineered into the Fc 
region, permitting the fonnation of interchain disulfide bonds and the generation of 
homodimers which may have enhanced capacities for internalization, ADCC and/or 
complement-mediated cell killing (Caron et al., 1992 J. Exp, Med 176: 1191-1195; 
Shopes, 1992 J. Immunol 148: 2918-2922). Homodimeric antibodies may also be 
generated by .cross-linking techniques known in the art (Wolff et al., Cancer Res, 53: 2560- 
2565). The invention also provides pharmaceutical compositions having the monoclonal 
antibodies or anti-idiotypic monoclonal antibodies of the invention. 

« 

m 

The antibbdies or fragments may also be produced, using current technology, by 
recombinant means. Regions that bind specifically to the desired regions of the CEG 
protein can also be produced in the context of chimeric or CDR grafted antibodies of 
multiple species origin. The invention includes an antibody, e.g., a monoclonal antibody 
which competitively inhibits the. unmunospecific binding of any of the monoclonal 
antibodies of the invention to a CEG protein. 
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Alternatively, methods for producing fiilly human monoclonal antibodies, include phage 
display and transgenic methods, are known and may be used for the generation of human 
monoclonal antibodies (reviewed in: Vaughan et al., 1998 Nature Biotechnology 16: 535- 
539). For example, fully human monoclonal antibodies may be generated using cloning 
technologies employing large human Ig gene combinatorial libraries (i.e., phage display) 
(Griffiths and Hoogenboom, "Building an in vitro immime system: human antibodies from 
phage display libraries", in: Protein Engineering of Antibody Molecules for Prophylactic 
and Therapeutic Applications in Man, Clark, M. (Ed.), Nottingham Academic, pp 45-64 
(1993); Burton and Barbas, "Human Antibodies from combinatorial Ubraries" pp 65- 
82). Fully human monoclonal antibodies may also be produced using transgenic mice 
engineered to contain hmman immunoglobulin gene loci as described in PCX Patent 
Application W098/24893, Jakobovits et al., published December 3, 1997 (see also, 
Jakobovits, 1998 Exp, Opin. Invest, Drugs 7: 607-614). This method avoids the in vitro 
manipulation required with phage display technology and efficiently produces high affinity, 
authentic human antibodies. 

The antibody or fragment thereof of the invisntion may be labeled with a detectable 
marker or conjugated to a second molecule, such as a therapeutic agent (e.g., a cytotoxic 
agent) thereby resulting in an inamunoconjugate. For example, the therapeutic agent 
includes, but is not limited to, an anti-tumor drug, a toxin, a radioactive agent, a cytokine, 
a second antibody or an eniyme. Further, the invention provides an embodiment wherein 
the antibody of the invention is linked to an enzyme that converts a prodrug into a 
cytotoxic drug. 

Examples of cytotoxic agents include, but are not limited to ricin, ricin A-chain, 

■ 

doxorubicin, daunorubicin, taxol, ethiduim bromide, mitomycin, etoposide, tenoposide, 
vincristine, vinblastine, colchicine, dihydroxy anthracin dione, actinomycin D, diphteria 
toxin, Pseudomonas exotoxin (PE) A, PE40, abrin, arbrin A chain, modeccin A chain, 
alpha-sarcin, gelonin, mitogellin, retstrictocin, phenomycin, enomycin, curicin, crotin, 
calicheamicin, sapaonaria officinalis inhibitor, and glucocorticoid and other 
chemotherapeutic agents, as well as radioisotopes such as ^^^Bi, ^^^In, ^^Y, and ^^^Re. 
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• * 

Suitable detectable markers for diagnostic used include, but are not limited to, a 
radioisotope, a fluorescent compound, a bioluminescent compound, chemiluminescent 
compound, a metal chelator or an enzyme. Antibodies may also be conjugated to an anti- 
cancer pro-drug activating enzyme capable of converting the pro-drug to its active form. 
See, for example, U.S. Patent Nos. 4,952,394 and 5,716,990. 

Additionally, a recombinant protein of the invention comprising the antigen-binding 
region of any of the monoclonal antibodies of the invention can be made. In such a 
situation, the antigen-binding region of the recombinant protein is joined to at least a 
functionally active portion of a second protein having therapeutic activity. The second 
protein can include, but is not limited to, an enzyme, lymphokine, oncostatin or toxin. 
Suitable toxins include those described above. 

Techniques for conjugating or joining therapeutic agents to antibodies are well known 
(Amon et al., "Monoclonal Antibodies For hnmunotargeting Of Drugs In Cancer Therapy", 
in: Monoclonal Antibodies And Cancer Therapy, Reisfeld et al. (eds.), pp. 243-56, Alan R. 
Liss, he, 1985; Hellstrom et al., "Antibodies For Drug Delivery", in: Controlled Drug 
Delivery (2nd Ed.), Robinson et al. (eds.), pp. 623-53, Marcel Dekker, hic. 1987; Thorpe, 
"Antibody Carriers Of Cytotoxic Agents hi Cancer Therapy: A Review", in: Monoclonal 
Antibodies '84: Biological And Clinical Applications, Pinchera et al. (eds.), pp. 475-506 
(1985); and Thorpe et al,, "The Preparation And Cytotoxic Properties Of Antibody-Toxin 
Conjugates", in: Immunol Rev., 62:119-58 (1982)), Techniques for joining detectable 
markers to antibodies are also known. 

D) PHARMACEUTICAL COMPOSITIONS OF THE INVENTION 

The invention includes pharmaceutical compositions for use in the treatment of microbial 
infections comprising a pharmaceutically effective amount of an anti-CEG antibody or a 
CEG polypeptide. 
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In one embodiment, the pharmaceutical compositions may comprise a CEG antibody, 
either unmodified, conjugated to a therapeutic agent (e.g., drug, toxin, enzyme or second 
antibody) or in a recombinant form (e.g., chimeric or bispecific). The compositions may 
additionally include other antibodies or conjugates (e.g., an antibody cocktail). 

m * 

The pharmaceutical compositions also preferably include suitable carriers and adjuvants 
which include any material which when combined with the molecule of the invention 
(e.g., an anti-CEG antibody or a CEG protein) retains the molecule's activity and is non- 
reactive with the subject's inunune systems. Examples of suitable carriers and adjuvants 
include, but are not limited to, human serum albumin, ion exchangers, alumina, lecithin, 
buffer substances such as phosphates, glycine, sorbic acid, potassium sorbate, and salts or 
electroljrtes such as protamine sulfate. Other examples include any of the standard 
pharmaceutical carriers such as a phosphate buffered saline solution, water, emulsions 
such as oil/water emulsion, and various types of wetting agents. Other carriers may also 
include sterile solutions, tablets including coated tablets and. capsules. Typically such 
carriers contain excipients such as starch, milk, sugar, certain types of clay, gelatin, 
stearic acid or salts thereof, magnesium or calcium stearate, talc, vegetable fats or oils, 
gums, glycols, or other known excipients. Such carriers may also include flavor and 
color additives or other ingredients. Compositions comprising such carriers are 
fonnulated by well known conventional methods. Such compositions may also be 
formulated within various lipid compositions, such as, for example, liposomes as well as 
in various polymeric compositions, such as polymer microspheres. 

The pharmaceutical compositions of the invention can be administered using 
conventional modes of administration including, but not limited to, intravenous, 
intraperitoneal, oral, intralymphatic or administration directly into the tumor. 
Intravenous administration is preferred. 

The pharmaceutical compositions of the invention may be in a variety of dosage forms 
which include, but are not limited to, liquid solutions or suspensions, tablets, pills, 
powders, suppositories, polymeric microcapsules or microvesicles, liposomes, and 
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injectable or infusible solutions. The preferred form depends upon the mode of 
administration and the therapeutic application. 

The CEG polypeptides and proteins of this invention are found in common pathogenic 
bacterial species such as Streptococcus pneumoniae. This organism causes upper 
respiratory tract infections. Thus, the peptides and proteins of this invention can be used 
as immunogens in subunit vaccines for vaccination against a pathogenic bacteria such as 
Streptococcus pneumoniae. Additionally, the ceg sequences of the invention can be used 
as DNA vaccines (U.S. Patent No. 5,736,524 and U.S. Patent No. 5,989,553). 

■ 

The polypeptides and proteins of this invention can be formulated . as univalent and 
multivalent vaccines. The protein can be mixed, conjugated or fused with other antigens, 
including B or T cell epitopes of other antigens. 

Further, when a haptenic peptide of the proteins of the invention is used, (i.e., a peptide 
which reacts with cognate antibodies, but cannot itself elicit an immune response), it can 
be conjugated to an immunogenic carrier molecule. Conjugation to an inununogenic 
carrier can render the oligopeptide immunogenic. Examples of carrier molecules are 
tetanus toxin or toxoid, diphtheria toxin or toxoid and any mutant forms of these proteins 
such as CRM.sub.197. Others include exotoxin A of Pseudomonas, the heat labile toxin 
of £. coli and rotaviral particles (including rotavirus and VP6 particles). Alternatively, a 
fragment or epitope of the carrier protein or other immunogenic protein can be used. For 
example, the happen can be coupled to a T cell epitope of a bacterial toxin. 

In formulating the vaccine compositions with the CEG polypeptides or proteins of the 
invention, alone or in the various combinations described, the immunogen is adjusted to 
an appropriate concentration and formulated with any suitable vaccine adjuvant. Suitable 
adjuvants mclude, but are not limited to: surface active substances, e.g., hexadecylamine, 
octadecylamine, octadecyl amino acid esters, lysolecithin, dimethyl- 
dioctadecylammonium bromide), methoxyhexadecylgylcerol, and pluronic polyols; 
polyamines, e.g., pyran, dextransulfate, poly. IC, carbopol; peptides, e.g., muramyl 
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dipeptide, dimethylglycine, tuftsin; oil emulsions; and mineral gels, e.g., aluminum 
hydroxide, aluminum phosphate, etc. and immune stimulating complexes. The 
immunogen may also be incorporated into liposomes, or conjugated to polysaccharides 
and/or other polymers. 

The vaccines can be administered to a human or animal in a variety of ways. These 
include intradermal, intramuscular, intraperitoneal, intravenous, subcutaneous, oral and 
intranasal routes of administration. Further, the vaccines can be live or inactivated 
vaccines. 

The most effective mode of administration and dosage regimen for the compositions of 
this invention depends upon the severity and course of the disease, the patient's health 
and response to treatment and the judgment of the treating physician. Accordingly, the 
dosages of the compositions should be titrated to the individual patient. 

♦ 

E) USES OF THE MOLECULES OF THE INVENTION 

« 

1) MOLECULAR WEIGHT MARKERS 

The nucleic acid molecules of the invention and their encoded proteins may be employed 
as molecular weight markers. For example, the molecular weight of each of the nucleic 
acid molecules having ceg sequences and their predicted polypeptides can be determined 
and can be used to compare against other gene sequences and proteins whose molecular 
weights are unknown. 

2) DIAGNOSTICS 

The nucleic acid molecules of the invention may be employed in diagnostic 
embodiments. For example, the presence of nucleotide sequences which are identical or 
similar to the ceg sequences of the invention may be detected within a biological sample. 
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The biological sample may include blood, serum or a swab from nose, ear or throat, may 
be determined by means of a nucleic acid detection assay. 

Nucleic acid probes or primers having sequences complementary to ceg sequences may 
5 be used in a hybridization assay to distect the presence of the sequences which are 
identical or similar to the ceg sequences of the invention in the biological samples. 
Typically, nucleic acids molecules obtained from a suitable biological sample are 
hybridized with labeled probes or primers. The resulting hybridized molecules are 
detected and resolved by methods well known in the art , such as Northern or Southern 
10 blotting, micro-array technology, or amplijfying with PGR technology. Other 
hybridization techniques and systems are known that can be used in connection with the 
detection aspects of the invention, including diagnostic assays such as those described in 
Falkow et al., U.S. Pat. No. 4,358,535. 

15 Examples of the PGR technology are disclosed in U.S. Patent Nos, 4,683,202 and 
4,965,188 (incorporated herein by reference). Generally, nucleic acid molecules are 
obtained from a suitable biological source and contacted with two primers corresponding 
to the ceg sequences disclosed herein, under conditions which allow for hybridization and 
polymerization to occur. A pair of probes, one corresponding to the 5' flanking region 
and the other corresponding to the 3' flanking region, would be sufficient to detect the 
nucleic acid molecules of the invention in a biological sample and may be used to 
indicate the amount of bacteria present. 

■ 

Alternative methods of detecting nucleic acid molecules include, for example, in situ 
hybridization techniques, where a ceg probe is used to detect homologous sequences 
within one or more cells, such as cells within a clinical sample or even cells grown in 
tissue culture. As is well known in tiie art, the cells are prepared for hybridization by 
fibcation, e.g. chemical fixation, and placed in conditions that allow for the hybridization 
of a detectable probe with nucleic acids located within the fixed cell. 
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The amount of ceg sequences present in a biological sample can be quantified and 
compared to the levels in a normal or "healthy" sample. For example, ceg sequences 
present in either increased or decreased levels, compared to the levels found in the 
control sample may indicate the presence of bacteria. This information is useful for 
diagnosis of a bacterial infection that requires treatment with an antibacterial agent. 

4 

Alternatively, the amount of CEG polypeptides present in a biological sample may be 
determined by means of an immimoassay. For example, labeled antibodies reactive 
against CEG polypeptides may be used in an immuno-reactive assay to detect the 
presence of CEG polypeptides in the biological samples. 

3) SCREENING CANDIDATE CEG SEQUENCES 

a) Gene Disruption Assay 

The ceg nucleotide sequences of the invention can be used to identify nucleotide 
sequences which are identical or similar to the ceg sequences that are required for 
bacterial cell viability. For example, the ceg sequences can be used in a bacterial gene 
disruption assay to screen candidate nucleotide sequences to identify sequences required 
for bacterial cell viability. 

The disruption assay can involve: introducing into a host cell a recombinant vector that is 
capable of integration into the host genome, where the recombinant vector, includes a 

» 

candidate sequence that putatively encodes a cell-viability gene product (e.g., the 
exogenous ceg sequence); the vector integrates the candidate sequence into a target 
sequence within the host's genome (e.g., the endogenous ceg sequence); and the host cell, 
so introduced, is screened for viability. The recombinant vector preferably includes a 
selectable marker so that the introduced host cell can be screened for viability in the 
presence of a selectable agent. 
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For example, Figure 1 shows a schematic representation of a gene disruption assay, 
within a bacterial host cell. In Figure 1 A, the recombinant vector, pEVP3, includes the 
CAT gene (e.g., the selectable marker chloramphenicol acetyl transferase) and an internal 
region of the ceg disrupting sequence; the internal region excludes the 5' and 3' ends of 
the ceg sequence. The "X" in Figure 1 indicates the recombmant pEVPB vector undergoing 
homologous recombination with the target sequence (e.g., within the host genome). In 
Figure IB, the resolved pEVPB vector that is mtegrated into the host genome, is shown. 
Left to right are the following elements: the native promoter of the target gene; a 5' partial 
copy of the target gene; the body of the integrated pEVPS vector including tiie disruptmg 
gene and CAT; and, a 3' partial copy of the target gene. Tlaus, integration of the pEVP3 
vector via homologous recombination results in two partial gene diqplications flanking the 
integrated vector. If the target gene is not essential for survival, it is possible to recover 
chloramphenicol-resistant colonies of 5. pneumoniae. Failure to recover chloramphenicol 
resistant colonies, in the presence of the proper controls as described below, indicates that 
the target gene may be essential for cell viability. 

More particularly, the gene disruption assay for screening candidate ceg sequences can 
involve the following steps. The recombinant pEVP-3. vector encoding CAT resistance 
and having a fragment of a candidate ceg sequence, can be introduced into 
transformation-competent S. pneumoniae cells by methods that are well-known in the art 
(Lee, M.S,, et al., 1998 Appl. Environ. Microbiol 64:4796-4802). The preferred size of 
the ceg fragment can be between about 200 to about 500 bp in length. It is advantageous 
that the candidate ceg sequence does not include the 5' and 3' ends that encode the N- 
and C-tenninal ends of the CEG polypeptide. This insures that the inserted ceg fragment 
and the disrupted endogenous ceg gene sequence are not capable of expression of a full- 
length, functional ceg gene product. The transformation-competent ceUs can be obtained 
by performing the transformation step in the presence of a heptadecapeptide that induces 
competence for transformation of S. pneumoniae (Havarstein, L, S., et al», 1995 Proc. 
Natl Acad Sci. 92:11140-11144), such as the CSP-1 peptide. The CSP-1 can be 
naturally-derived or synthetic. Additionally, the transformation step can.be optimized by 
performing the transformation when the cells have reached a density which is optimal for 
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transformation (e.g.,. 3 X 10^ cells per ml.) (Havarstein, L. S. et al. supra). The 
recombinant vector can be introduced into the competent pneumococci and may undergo 

y 

homologous recombinatioa, whereby the candidate ceg fragment recombines with the 
corresponding endogenous ceg sequence, resulting in targeted integration of the vector 
5 into the pneumococcal genome and disruption of the endogenous ceg. 

The transformed cells can be plated on or cultured in chloramphenicol-containing growth 
medium. The cells can be cultured imder standard conditions, such as 37° C in 5% CO2 
for approximately 40 to 48 hours, for the purpose of selecting cells that carry the 
1 0 integrated vector. 

Additionally, control samples can be run in parallel with the gene disruption assay, in 
order to determine whether the gene disruption procedure is working properly. For 
example, the control samples can be used to calibrate the gene disruption experiment so 

15 that disruption of a known non-essential bacterial gene results in an approximate nxmiber 
of colonies per plate. Similarly, the disruption of a known essential gene can be 
calibrated to yield only zero or one colony per plate. The appearance of one colony is 
due to the rare illegitimate recombination into a non-homologous sequence. In particular, 
a known non-essential gene such as the lytA gene (Tomasz, A., et al., 1988 BacterioL 

20 170:5931-5934) can be used so that between about 70 to 100 chloramphenicol-resistant 
colonies will grow per plate. Similarly, the ftsZ gene (Lutkenhaus, J. F., et al., 1980 J. 
BacterioL 143:1281-1288), a known essential gene, can be used to yield zero or, rarely, 
one colony per plate. As is well known in the art, specific parameters that are involved in 
any given gene disruption assay can be adjusted to calibrate the desired number of plated 

25 cells in the control samples. Experimental parameters that can be adjusted include, but 

■ 

are not limited to, the E, coli strain used to propagate the vector/insert, the J&agment 
length of the sequence to be integrated, the amount of recombinant integration vector 
used to transform the cells, use of transformation-competent cells, and plating density of 
the transformed cells. 

30 
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The transformed cells carrying the recombinant integration vector that disrupts 
expression of an endogenous essential gene (e.g., the target ceg gene) can be identified, 
based on a selectable phenotype such as non-viability. For example, the cells that cany a 
disrupted non-essential gene will be viable and, due to the integration of pEVP3, will 
grow on chloraraphenicol-containing medium. In contrast, cells that carry a disrupted 
essential gene will not grow (e.g., non-viable) on the chloramphenicol-containing 
medium. Thus, the transformed cells that do not grow under these selective conditions 
carry an endogenous gene sequence that is essential for cell viability which has been 
disrupted by an exogenous candidate fragment, thereby identifying a ceg sequence. Steps 
one through three may be repeated in order to confirm that the ceg sequences, so 
identified, are essential for cell viability. 

b) Autolysin Assay 

It is advantageous to perform additional steps to detennine whether the homologous 
recombination events result in disruption of the intended target gene sequence. The lyiA 
transformation control can be used to confirm that the transformation system is 
functioning properly. For example, a phenotypic test for autolysin activity (lytA gene 
product) can be performed to determine that the exogenous lytA fragment is correctly 
integrated into the lytA site within the host genome. This typically involves flooding the 
culture plates containing transformants carrying the integrated lytA control vector with a 
solution of detergent, such as 0.1% deoxycholate, which triggers cell lysis in fyM-intact 
cells (e.g., the cells that have not undergone homologous recombination). After about 5- 
10 minutes the colonies with intact lytA will appear ghost-like due to cell lysis, and the 
colonies with a disrupted lytA gene will appear intact. 

c) Polarity Analysis 

The ceg sequences that are confirmed to be essential for cell viability can be examined 
fiirther by performing a polarity analysis to determine if the corresponding endogenous 
ceg sequence is organized in an operon. Polarity is an effect unique to prokaryotes and is 
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the result of the operon organization of bacterial genomes. Many bacterial genes are 
arranged in operons in which multiple genes are under the control of a single regulatory 
sequence (e.g., a promoter) and are transcribed into a single mRNA transcript. With 
respect to the orientation of multiple genes within an operon, the genes that are proximal 
5 to the regulatory sequence are said to be *^lpstream" genes and the genes that are distal 
are said to be "downstream" genes. For example, many operons contain genes encoding 
different protems that catalyze discrete steps of a common biochemical pathway. Thus, 
any of the proteins that catalyze the steps of the pathway may be essential for cell 
viability. 

10 

The presence of operons in a bacterial host genome may influence the interpretations of 
the gene disruption results. For example, disruption of an upstream gene may be 
enoneously interpreted as affecting the expression of the disrupted gene but may, in fact, 
have expression affects on the intact downstream genes. Therefore, it is advantageous to 
15 perform apolarity analysis to determine ifaceg sequence is part of an operon. . 

A polarity analysis can involve performing an in vivo gene disruption procedure using^ as 
the disrupting sequence, a ceg sequence that includes the entire ceg coding sequence 
region but lacking expression regulatory sequences. This differs from the gene disruption 
20 assay, which involves the central region of the ceg sequence. The polarity analysis 
involves gene duplication via homologous recombination. For example, the pEVP-3 
vector having the entire coding region of a ceg sequence can be used for the polarity 
analysis (Figure 2 A). The polarity analysis will yield different results depending on the 
organization of the endogenous target sequence within the host genome. 

25. 

For example, Figure 2 shows a schematic representation of the polarity test for operons, 
within a bacterial host cell. In Figure 2A, the recombinant vector, pEVP3, includes the 
CAT gene and the entire coding region of the ceg disrupting sequence. The "X" in Figure 
. 2 indicates the recombinant pEYP3 vector undergoing homologous recombination with the 
30 target sequence. Two of the possible results of homologous recombination are shown in 
Figures 2 B and C. 
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In Figure 2 B, case 1, if the endogenous target sequence is not organized in an operon, the 
integration event may yield: a functional target sequence (e.g., it is capable of 
expression); a duplicate non-functional target sequence that lacks a promoter; and a 
5 functional downstream gene (e.g., Gene B) that is controlled by its own promoter. The 
cells carrying this type of integrated target sequence can be recovered as viable cells that 
. grow in the presence of chloramphenicol; this condition is termed "polarity negative". 

Li Figure 2 C, case 2, if the target sequence is organized in an operon, then the integration 
10 event may yield an integration site that is similar to that described for case 1, including: a 
fimctional target sequence; and a duplicate non-functional target sequence which is not 
functional. However, this integration event may also yield a non-functional downstream 
gene (e.g.. Gene B) because expression of this downstream gene is controlled by a 
promoter located upstream of the insertion site. The cells that cany this type of 
15 integrated target sequence will be iwn- viable; this condition is termed "polarity positive". 
Thus, the polarity analysis provides a method to determine whether integration of a 
. recombinant vector into a target ceg sequence effects expression of downstream genes. 

The ceg sequences disclosed herein (SEQ ID NOs.: 1-113, 227-331) encode gene 
20 products that are essential for viability in S. pneumoniae. Furthermore, many of these 
ceg sequences have been analyzed for the polarity effect and .the results are presented in 
Table I. One subset of ceg sequences is classified as polarity negative (-), since the 
homologous recombination event did not effect the expression of downstream genes. 
Another subset of ceg sequences is classified as polarity positive (+), since the 
25 homologous recombination eyent did affect the expression of downstream genes. The 
ceg sequences that have not yet been classified as polarity positive or negative are 
indicated in Table I as a blank. For the ceg sequences that are classified as polarity 
positive, the genes downstream of the disrupted endogenous ceg sequences may or may 
hot also be essential. 

30 
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4) ASSAYS FOR mENTIFYING CEG LIGANDS A^m OTmi'.R 
BINDING AGENTS 

The present invention provides screening methods for identifying agents that interact 
and/or bind to the CEG proteins of the invention, such as a ligand. An agent can be, for 
example, a natural product, a derived or synthetic chemical molecule, a polypeptide, a 
nucleic acid molecule, or a metal. The agents that interact witii CEG proteins may cause 
bacterial cell death by disrupting the functions of CEG proteins, including, but not 
limited to, nucleotide biosynthesis, DNA replication, RNA transcription, protein 
translation, and/or cell wall biosynthesis. Accordingly, the present invention provides 
screening methods for identifying agents having antibacterial activity, such as agents that 
cause bacterial cell death by interacting with the CEG proteins. These antibacterial 
agents are useful for treating diseases and afflictions associated vwth bacterial infections. 

Various methods can be used to discover agents having antibacterial activity, as 
determined by the ability of the binding agent to bind to a CEG protein and disrupt the 
function of the CEG protein. These screening methods include whole cell in vivo assays 
as well as in vitro assays with cellular components. 

An in vivo screening method for identifying ligands that bind CEG polypeptides can be 
performed in a whole cell assay. A. typical method may be the use of whole bacterial 
cells to assess the antibacterial properties based on cell growth or viability. These 
methods can include methods for measuring cell growth and/or viability, for example, by 
optical density or zones of growth (Koch, A. L. et al., 1970 Anal. Biochem. 38:252-259; 
Biemer, J. J. et al., 1973 Ann. Clin. Lab, ScL 2:135-140; Manual of Clinical 
Microbiology, 7* edition, Murray, P. R. (ed), ASM Press), by growth inhibition in an 
agar assay (Murray, P. R., supra), or other means of detecting cell metabolism 
(Mychajlonka, M. et al., 1980 Antimicrob. Agents Chemother. 17:572-582), and are well 

> 

known to those skilled in the art. In addition, there are molecular biology-based detection 
methods for use wdth whole bacterial cells, such as gene reporter assays, to monitor the 
effect of the ligand on specilSc targets (Slauch, J. M., et al., 1991 Methods Enzymol 
204:213-248). Examples of the reporter genes include, but are riot limited to, beta- 
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galactosidase, alkaline phosphatase, luciferase, and green fluorescent protein. For 
example, one embodiment provides a reporter system that monitors inhibition of DNA 
synthesis by fusing a reporter such as beta-galactosidase QacZ) to genes known to be 
upregulated by the cessation of DNA synthesis as a result of the binding of ligands to the 
5 DNA synthetic apparatus. (Shurvinton, C. E., et al., 1982 Mol. Gen. Genetics 185:352- 
355; Rosato, A., et al., \99% ArUimicrob. Agents Chemother, 42:1392-1396). 

Alternatively, the yeast two-hybrid system (Fields, S. and Song, O. 1989, Nature 
340:245-246) may be adapted to screen for ligands that bind CEG polypeptides. Generally, 

10 the yeast two-hybrid system is performed in a yeast host cell carrying a reporter gene, and 
is based on the modular nature of the GAL transcription factor which has a DNA binding 
domain and a transcriptional activation domain. The yeast two-hybrid system relies on 
the physical interaction between a recombinant polypeptide that comprises the GAL 
DNA binding domain and another recombinant polypeptide that comprises the GAL 

15 transcriptional activation domain. The physical interaction between the two recombinant 

t * 

polypeptides reconstitutes the transcriptional activity of the transcription factor, thereby 
causing expression of the reporter gene. Either of the recombinant polypeptides used in 
the two-hybrid system can be generated to include a CEG polypeptide sequence to screen 
for binding partners of CEG. 

20 

Another method uses the bacterial CEG proteins as the basis for in vitro assay systems to 
detect binding agents. Typically, the in vitro screening method comprises: a) generating 
the CEG protem of the invention, or membranes enriched m the CEG protein; b) 
exposing the CEG protein or membranes to a candidate agent; and c) detecting the 
25 interaction of the CEG protein with the agent by any suitable means. Additionally, the 
screening methods may be adapted to automated high-throughput procedures, such as 
PANDEX.RTM Baxter-Dade Diagnostics, allowmg for efficient high-volume screening 
of candidate agents. 

30 An alternative method for screening potential, ligands involves an in vitro binding 
procedure. Typically, the CEG proteins of the invention can be produced using 
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recombinant DNA technology and host-vector systems as described herein, A candidate 
agent is introduced into a reaction vessel containing the CEG protein, of fragment 
thereof; the candidate agents may be detectable by methods such as, but not limited to, 
radioisotope or chemical labeling. Binding of the CEG protein by a candidate agent can 
be determined by any suitable means, including, for example, quantifying bound label 
versus unbound label using any suitable method. Binding of a candidate agent may also 
be detected by methods similar to an altemative physical method disclosed in U.S. Patent 
No. 5,585,277. In this method, binding of a candidate agent to a protein is assessed by 
monitoring the ratio of folded protein to unfolded protein, for example by monitoring 
sensitivity of the protein to a protease, or amenability to binding of the protein by a 
specific antibody against the folded state of the protein, or binding to chaperone protein, 
or by binding to any suitable surface. 

The invention provides methods of identifying compounds that modulate (e.g., activate or 
inhibit) the function of a CEG polypeptide. Essentially any compound can be used in the 
assays of the invention. The preferred compounds are those that are soluble in aqueous 
or organic solutions. It will be appreciated by those of skill in the art that there are many 
coEomercial suppliers of chemical compounds that can be used in the methods of the 
invention, including Sigma Chernical Co. (St. Louis, Mo.), Aldrich Chemical Co. (St. 
Louis, Mo.), Sigma-Aldrich (St Louis, Mo.), Fluka Chemika-Biochemica Analytika 
(Buchs, Switzerland), and the like. 

The present invention provides methods for detecting compounds which are identified as 
modulators of CEG function. The methods of the invention can be performed using 
isolated CEG polypeptides, or use whole cells expressing the CEG polypeptide. The 
steps, of the method using isolated CEG polypeptides include: contacting the isolated 
CEG polypeptide with a candidate compound; and determining whether the function of 
the CEG polypeptide is altered. The steps of the method using whole cells include: 
contacting the whole cells with a candidate compound; and determining whether the cell 
dies, indicating the compound inhibited the function of a CEG polypeptide. 
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The preferred methods of the invention provide high-throughput screening assays for 
identifying compounds which modulate the function of a CEG polypeptide. The high 
throughput methods permit screening of large libraries of compounds. For example the 
high throughput methods can use automated assay steps. The assays can be performed 
in parallel on a solid support, as microtiter formats on microtiter plates in robotic assays 
are well known. A preferred embodiment of the methods includes adapting the methods 
to use microtiter plates or pico- nano- or micro-liter arrays. In high throughput assays it 
is desirable to run positive controls to ensure that the components of the assays are 
working properly. 

The high throughput screening methods of the invention include .providing a 
combinatorial library containing a large number of compounds (candidate modulator 
compounds) (Borman, S, C & E. News, 1999, 70(10), 33-48). Such combinatorial 
chemical libraries can be screened in one or more assays to identify library members 
(particular chemical species or subclasses) that exhibit the ability to modulate the 
function of the CEG polypeptide (Borman, S., supra; Dagani, R. C <fe £. News, 1999, 
70(10), 51-60). The compounds, so identified, can serve as lead-compounds or can 
themselves be used as potential or actual therapeutics. 

A combinatorial chemical library is a collection of diverse chemical compounds 
generated by using either chemical synthesis or biological synthesis, to combine a 
number of chemical building blocks, such as reagents. For example, a linear 
combinatorial chemical library, such as a polypeptide library, is formed by combining a 
set of chemical building blocks (amino acids) in every possible way for a given 
compound length (i.e., the number of amino acids in a polypeptide compound). Millions 
of chemical compounds can be synthesized through such combinatorial mixing of 
chemical building blocks. 

Preparation and screening of combinatorial chemical libraries is well known to those of skill in 
the art. Such combinatorial chemical libraries include, but are not limited to, peptide libraries 
(see, e.g., U.S. Pat. No. 5,010,175, Furka, Int. J. Pept. Prot. Res., 1991, 37:487-493 and 

67 



wo 0 1/4972 1 PCT/USOO/35604 

Houghton, et al.. Nature, 1991, 354, 84-88). Other chemistries for generating chemical 
diversity libraries can also be used. Siich chemistries include, but are not limited to, peptoids 
(PCX PubUcation Np. WO 91/19735); encoded peptides (PCT Publication WO 93/20242); 
random bio-oligomers (PCT Publication No. WO 92/00091); benzodiazepines (U.S. Pat. No. 
5,288,514); diversomers, such as hydantoins, benzodiazepines and dipeptides (Hobbs, et al., 
Proc, Nat. Acad ScL USA, 1993, 90, 6909-6913); vinylogous polypeptides (Hagihara, et al., J, 
Amer. Ghent Sod. 1992, 114, 6568); nonpeptidal peptidomimetics with ig/a-D-glucose 
scaffolding (Hirschmann, et al., J, Amer. Chem. Soc, 1992, 114, 9217-9218); analogous 
organic syntheses of small compound libraries (Chen, et al., J. Amer. Chem. Soc, 1994, 116, 
2661; Armstrong, et al. Acc. Chem, Res., 1996, 29, 123-131); or small organic molecule 
libraries (see, e.g., benzodiazepines, Baum C&E News, 1993, JaiL 18, page 33,); 
oligocarbamates (Cho, et al.. Science, 1993, 261, 1303); and/or peptidyl phosphonates 
(Campbell, et al., J, Org. Chem. 1994, 59, 658); nucleic acid libraries (see, Seliger, H et al., 
Nucleosides & Nucleotides, 1997, 16, 703-710); peptide nucleic acid libraries (see, e.g., U.S. 
Pat. No. 5,539,083); antibody libraries (see, e.g., Vaughn, et a!., Nature Biotechnology, 1996, 
14(3), 309-314 and PCT/US96/10287); carbohydrate libraries (see, e.g., Liang, et al., Science, 
1996, 274, 1520-1522 and U.S. Pat No. 5,593,853, Nilsson, UJ, et al.. Combinatorial 
Chemistry & High Throughput Screening, 1999 2, 335-352; Schweizer, F; Hindsgaul, O. 
Current Opinion In Chemical Biology, 1999 3, 291-298); isoprenoids (U.S. Pat No. 
5,569,588); thiazoUdinones and metalhiazanones (U.S. Pat No. 5,549,974); pyrrolidines (U.S. 
Pat Nos. 5,525,735 and 5,519,134); moipholino compounds (U.S. Pat No. 5,506,337); 
benzodiazepines (U.S. Pat No. 5,288,5 14); and other similar art. 

Devices for the preparation of combinatorial libraries are commercially available (see, 
e.g., 357 MPS, 390 MPS, Advanced Chem. Tech, Louisville Ky., Symphony, Rainin, 
Wobum, Mass., 433A Applied Biosystems, Foster City, Calif., 9050 Plus, Millipore, 
Bedford, Mass.). In addition, numerous combinatorial libraries are themselves 
commercially available (see, e.g., ComGenex, Princeton, N.J., Asinex, Moscow, Ru, 
Tripos, Inc., St. Loms, Mo., ChemStar, Ltd., Moscow, RU, 3D Pharmaceuticals, Exton, 
Pa., Martek Bio sciences, Columbia, Md., etc.). 
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In the high throughput methods of the invention, several thousand different candidate 
compounds can be screened in a relatively short period of time. For example, each well 
of a microtiter plate can be used to run a separate assay against a selected potential 
modulator, or if concentration or incubation time effects are to be observed, every 5-10 
wells can test a single modulator. Thus, a single standard microtiter plate can assay about 
100 (96) modulators. If 1536 well plates are used, then a single plate can easily assay 
from about 100 to about ISOO different compounds. It is possible to assay many different 
plates per day; assay screens for up to about 6,000-20,000, and even up to about 100,000- 
1,000,000 different candidate modulator compounds are possible using the methods of 
the invention. 

The following examples are presented to illustrate the present invention and to assist one of 
ordinary skill in making and using the same. The examples are not intended m any way to 
otherwise limit the scope of the invention. 

EXAMPLE 1 

The following provides a general description of how a list of candidate ceg sequences 
was generated. The list was generated by selecting candidate ceg gene sequences from a 
Concordance web engine using the method described in: Bruccoleri, R.E., Dougherty, 
TJ., Davison, D.B. (1998) "Concordance analysis of microbial genomes" in: Nucleic 
Acids Res 26:4482-4486, 

Microbial Genomics CEG Discovery Process Summary. 
Microbial Concordance Analysis 

The entire genomic sequence data of various bacteria was acquired from several public 
and proprietary sequence database sources, including GTC (Genome Therapeutics 
Corporation), and TIGR (The Institute for Genomic Research). 
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Predicted ORPs from the genomic data were identified, translated, and stored. The 
desirable ORFs were at least 90 amino acid residues in length. Concordance analysis 
was performed among bacteria and various parameters were used to filter out genes with 
high similarity to eukaryotes. 

Concordance Analysis 

* ^ 

The entire genomic sequence of various Eubacteria was acquired from several public and 
private sources. The proprietary PathoGenome System from Genome Therapeutics 
Corporation, Waltham, MA, USA contributed data. Public data was obtained from 
GenBank fhttp ://ncbi.nlm.nih. gov\ The Institute for Genomic Research (TIGR), the 
Yeast Proteome Database, from Proteome, Inc. of Beve;rly, MA, and the Sanger Center of 
the Medical Research Council of the United Kingdom (http://www.sanger.ac.uk). 
Additionally, the non-microbial sequence data used as a basis for comparison and data 
subtraction was obtained from a proprietary database, including the LifeSeq Database 
from Incyte Pharmaceuticals, Palo Alto, CA. 

Where required, Incyte nucleotide sequences were translated into protein sequences in all 
six possible reading frames. GTC supplied predicted protein sequences with their data. In 
the case of other eubacterial nucleotide sequences, the projgram CRITICA (Badger, J. and 
Olsen, G., 1999 "CRITICA: coding region identification tool invoking comparative 
analysis" in: Molecular Biology and Evolution 16:512-524). The sequences were stored 
in flat files on a Unix computer system. Each predicted amino acid sequence had to be 
greater than 90 amino acids. 

> 

Each predicted protein sequence was compared to every other sequence (an ''all-againlst- 
all" comparison). The program used was PASTA (Pearson, W.R., "Flexible sequence 
similarity searching with the FASTA3 program package." Methods in Molecular Biology 
2000 132:185-219.) The parameters used were ktup=2, and all scores above the default 
cutoff were kept. The output was processed and stored in a PostGres 95 database 
flittp://www.DostgresqLorg) . Graphical user interfaces, using web browser technology, 
were constructed to query the database. 
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A Concordance Analysis was performed on die data. The question used to generate the 
dataset was show all Streptococcus pneumoniae open reading frames with a similarity 
greater than or equal to 30% overall protein sequence identity to both selected gram- 
positive and/or gram-negative bacteria in the database. The data was further required not 
to match yeast or human sequences at greater than 30% overall protein sequence 
similarity. The resulting dataset included a list of more than 400 conserved amino acid 
sequences having known or unknown function. The amino acid sequences having 
unknown functions formed the basis of a list designated Conserved Unknown Reading 
Frames, or CURFs which is a subset of the total list of CEGs (e.g., CURFs includes 
known and unknown). 

The resulting list of conserved genes (e.g., more than 400 sequences) was used as a basis 
for selecting and screening bacterial gene sequences that are essential for cell viability. 
The Concordance system was designed to permit high-throughput identification of 
conserved gene sequences in the database. (Bruccoleri, R, Dougherty, T, and Davison, D. 
1998 "Concordance analysis of microbial genomes" Nucleic Acids Res. 26:4482-4486.) 

Data Curation And Analysis 

Exact N-terminal and C-terminal translational start sites of genes were identified by 
pairwise similarity searches, multiple sequence alignments. Ribosome binding sites, 
terminators, nearby genes, operons were identified. 

The resulting list of conserved genes was used as a basis for selecting and screening 
bacterial gene sequences that are essential for cell viability. This Concordance system 
was designed to permit high throughput use of the conserved gene sequences contained 
on the list. A set of Knockout PCR primers were generated, based on the list of 
conserved genes, for the purpose of use in the gene disruption procedure described 
below. The PCR primers were designed to amplify a central 300-500 bp region of the 
ceg (to prevent generation of a functional copy of the ceg gene following integration), 
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ordered electronically, the primers were placed in a 96-well format, and used in the gene 
disruption procedure as described below. 

EXAMPLE 2 

The following provides a description of the procedure to generate recombinant vectors of 
pEVP-3 having inserts of candidate ceg nucleotide sequences. The Knockout primers 
generated by the method described in Example 1 above were used to generate DNA 
fragments comprising candidate ceg sequences. 

■ 

Genomic PGR Knockout Target Fragment Generation 

96-well plate format were set up (36 jil H2O , 5 |il lOx Vent™ buffer, 1 jil gene specific, 
knockout forward primer (0.5 jig/^il), 1 \il gene specific knockout reverse primer (0.5 
jig/fil), 0.5 fil Vent™ DNA polymerase (2000 U/ml New England Biolabs, Beverly, 
MA), 1.5 \xl each dNTPs (lOmM; 6.0 \xl total), 0.5 ^il 5. pneumoniae chromosomal DNA 
(0.5 |ag/jil), 50 |il total volume/reaction). 

The nucleotide sequences of the forward and reverse knockout primer pairs were 
generated from the nucleotide sequence information obtained from the Genomic 
Therapeutics Corporation database for Streptococcus pneumoniae. The primer pairs were 
each used in a PGR reaction to generate a unique internal (e.g., central region) fragment 
of the candidate gene targeted for knockout. 

The PGR program was set in the PGR machine (Initial 95 °G - 5 minutes: 30 Gycles of 
95 °G - 1 minute, 58 °G - 1 minute,. 72 °C - 3D seconds; Final, 72 °G - 10 minutes, 4 **G - 
hold indefinitely). 5 ^1 of each reaction was run on an 0.8% agarose gel after purifying 
fragment over PGR purification kit (Qiagen) to visualize the fragments then ligation 
reactions were performed. 
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Ligation Reactions proceeded (set up in 96-well plate format (10.0 \xl genomic PGR 
fragment (generated from step 2 above), 1.0 ^1 pEPV-3 Smal-cut vector (1:10 dilution of 
vector DNA at 50-100 ng/|al), 1.5 fil lOx ligation buffer (New England Biolabs'^'^, 1.0 \il 
T4 DNA Ligase (New England Biolabs^ 400,000 U/ml), 1,5 fxl ddHjO, 15.0 ^1 total 
5 reaction volume). 

Reactions were allowed to incubate in 96-well plate at - 14 °C overnight in the PGR 
machine. Transformations, into E. coli for in vivo amplification were proceeded the 

■ 

following day. 

* 

10 

The nucleotide sequences of the forward and reverse primer pairs used for the polarity 
test were generated in a similar manner, from the nucleotide sequence information 
obtained from the Genomic Therapeutics Gorporation database for Streptococci 
pneumoniae. The primer pairs were each used in a PGR reaction to generate a unique 
15 fragment of the candidate gene targeted for the polarity test. The fragment generated for 
tiie polarity test included the entire ceg coding sequence region but lacking the expression 
regulatory sequences. 

Transformation into E, coli (strain LE392'>: 

20 

The next day, 3 [xl of above ligation mix was used per transformation reaction plus 50 \il 
LE392 competent cells. Reactions were set up in 96-well plate format; incubated on ice 
for 30 minutes; heat-shocked at 42° G for 90 seconds; and incubated on ice 2 minutes; 
100 |il SOG media (Gibco BRL) was added; then incubated at 37° G on platform shaker 
25 for 1 hour; plated on LB/chloramphenicol (13.0 \xg/m\) agar plates for constructs over 
night at 37° C with plates inverted and proceeded with colony PGR to confirm constructs. 
The universal primers flanking the insert site in pEVP-3 were used for PGR 
amplification. 

30 The colony PGR involved the following. 96-well plate format was set up (36.5 jil H2O, 
0.5 nl pEPV3 forward primer (0.25 ^ig/^l), 0.5 (il pEPV3 reverse primer (0.25 ^g/fil), 1.5 
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Hl each (6.0 ^1 total) dNTP5 (10 mM), 0.5 nl Vent™ DNA polymerase, 5 jil lOx Vent™ 
buffer, 1 jil of a 1 :50 cell dilution, 50 ^il total volume). 

pEPV3 forward primer: 5' CATCAAGCTTATCGATACCGTCG 3' (SEQ ID NO:437) 
5 p EPV3 reverse primer: 5' CACAGTAGTTCACCACCTTTTCCC 3 ' (SEQ ID NO:438) 

Colonies of E. coli LE392 were picked onto a master plate of LB + 13 fig/ml 
chloramphenicol (incubate throughout the day at 37° C) and then into 50 nl H2O which 
has been placed into a 96-well plate. 1 jil of this dilution was used in above PGR reaction 
10 (if the 96-well dilution plate is kept you will not need to prepare a master plate). Cultures 
for minipreps of plasmid candidates may be prepared directly from the cell dilutions. 

* ■ 

The PGR program was run (95 °C - 5 minutes, 30 Cycles of: 95 °C - 1 minute, 58 °C - 1 
minute, 72 °C - 30 seconds, 72 °C - 10 minutes, 4 °C - hold). 

15 

A 10 jil/ reaction was run on a 1.0 % TBE gel. A gel designed for 96 well plates and a 
multichannel pipettor were used to ease loading of the sample rows. The gel was run and 
stained with ethidium bromide. The positive clones were identified with appropriate 
molecular size insert(s), amplified by the flanking pEVP-3 primers. 

20 

Minipreps Of Plasmids To Identify Cells Carrying The Pevp-3 Vector With An Insert 

« 

« 

The constructs that carried an insert were identified. The constructs having an insert 
were inoculated into a 5 ml LB/Cm culture, and incubated over night at 37 °C with 
25 aeration. Miniprep plasmid DNA was prepared by a standard procedure. The miniprep 
• DNA was digested with appropriate restriction enzj^es to confirm the presence of the 
insert (enzymes flank Smal site m pEVP-3) (10 fil miniprep DNA, 2 fxl 10 x bufifer, 1 \i\ 
Xbal, 1 \i\ Xhol, 6 ^l ddH20, 20 fil total volume for digest). 
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To confirm the presence of an insert, the digest reactions were electrophoresed on an 
agarose gel and the gel was stained with ethidium bromide. The positive clones were 
used for the S. pneumoniae KNOCKOUTs procedure, 

5 The confirmatory PGR reactions, using knock out-specific primers (quality control step) 
involved 35.5 \xl H2O, 5 |il 10 x Venf^^ buffer, 1 ^1 knockout forward primer (0.5 \xg/\i\), 
1 |il knockout reverse primer (0.5 |ig/|il), 0.5 ^1 Venf^^ (6.0 ^1 total) DNA Polymerase 
(2000 U/ml), 1.5 \xl each dNTPs (lOmM, 6.0 nl total), 1.0 fil miniprep DNA firom test 
clone, 50 lil total reaction volume. The PGR program was as follows: 95 °G for 5 
10 minutes, 30 Cycles of: 95 °C for 1 mmute, 60 °G for 1 minute, 72 °C for 30 seconds, 72 
°C for 10 minutes, hold at 4 °G. The presence of the correct-sized insert was confirmed 
by agarose gel electrophoresis and ethidium bromide staining. The confirmed clones 
were used for the S, pneumoniae gene KNOGKOUT procedure. Glycerol stocks were 
made of all positive E. coli LE392 constructs and frozen at - 80 degrees G. 

15 

EXAMPLE 3 

The following provides a description of the high throughput gene disruption procedure 
used in S, pneunomiae strain (e.g., gene knockout procedure). The candidate ceg 
20 fragments that were generated by the method described in Example 2 were used in the 
gene disruption procedure in order to identify ceg nucleotide sequences that are required 
for cell viability, 

i 

Reactions were set up in a 1.5 ml eppendorf tubes or 96 well plate (1 jxg total of miniprep 
25 pEVP-3 H- insert DNA (usually 10 fil of Qiagen miniprep DNA); then 200 (a1 of 5. 

« 

pneumoniae (strain Rx-1) competent cells diluted 1:10 in competence media was added 
(I ml of competence media = 980 jil Todd Hjgvitt (Difco Laboratories) with 0.5% yeast 
extract, 20 \xl 10% BSA'. 1 fil 10 % GaG12, and 0.5 ^1 (200 lig/Tol) Csp-1 competence 
peptide). * 

30 
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Controls were run with each KNOCKOUT experiment and involved 1 jig pEPV3 Lyt A 
construct = positive control (non-essential), or 1 pEPV3 Fts Z construct = negative 
control (essential). Then the 96 well plates and controls were incubated at 37 ^'C for 2.5 
to 3 hours in 37 °C room without shaking. The 200 ^1 of the samples were plated on 
5 Todd Hewitt agar plates with 0.5% yeast extract and 2 jig/ml chloramphenicol. 

The samples were incubate over night at 37 °C in 5% CO2 incubator. Control plates were 
checked for presence of colonies (pEVP-3::lytA) and no growth (pEVP-3::ftsZ). Plates 
were examined for growth (ca. 70-150 colonies) designating nonessentials and zero 
1 0 colonies designating essential genes. 

^ ■ 

The polarity test was perfonned in a similar toanner, using the polarity fragments, 
described in Example 3 . 

15 

EXAMPLE 4 

The following provides a description of the autolysin procedure used to determine that 
the non-essential control samples of S pneumoniae contain a disrupted lyM gene. 

20 

Phenotvpic Autolvsin Test 

The culture plates containing transformants carrying the lytA control vector were flooded 
25 with 0.1% deoxycholate in H2O. The plates were observed after 5-10 minutes. Plates 
with "ghosts" indicated intact lytA gene, or plates without "ghosts" indicated a disrupted 
lytA gene. The "ghost" phenomenon is due to detergent triggered autolysis of the cells, 
causing a gradual fading of the colonies, 

30 The detergent treatment triggers the autolysin in lytA intact cells; it caimot trigger the 
autolysin (lytA gene product) in lytA disrupted cells. Colonies with intact lytA "ghost" in 
S-10 minutes due to massive pneumococcal cell lysis. 
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EXAMPLE 5 

The following provides a description of the procedure used to express the CEG proteins 
5 (e.g., designated CFE proteins) in E. coli cells. 

CEG Protein Production 

Full-length ceg gene were inserted into pET-21 expression vector using the E. coli BL21 
10 A.DE3 expression system using, the following method: 

For each ceg, custom primers were xised to insert N- and C- temuni into vectors such that 
the 5' end (N-terminus of the CEG) is positioned properly for expression behind the T7 
promoter and optimally placed with regard to the pET ribosome binding site. The pET 

15 vectors contain an Ndel site which allows positioning of ATG start site in the vector. In 
cases where the ceg sequence contains an internal Ndel site, blunt ligation of the ceg PCR 
fragment into the vector is accomplished via Klenow fill-in of the Ndel site. In many 
cases, primers were also designed such that the ceg 3' (C-tenninus of the expressed 
protein) will contain an in-frame extension of 6X-histidine residues, encoded in the 

20 vector sequence of pET-21. The individual cegs were PCR amplified via custom 
designed primers as described above. Both ceg PCR and- vector DNA were digested with 
appropriate restriction enzymes. The full-length ceg were ligated into the pET 
expression vector. The ligation mixture was transformed into competant £. coli BL2i 
XDE3 cells and selected for transformants on LB agar with 50 jxg/ml ampicillin. Positive 

25 insert bearing clones were screened via minipreps of the plasmids and size analysis on 
0.8% agarose gels, with detection by ethidium bromide staining, as above. 

Protein Production 

* 

30 The proper reading frame of each ceg inserted into pET-21 is verified by DNA 
sequencing. 
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A small (2-S ml) test culture of E, coli BL21 A,DB3 with the insert-bearing plasmid is 
tested for protein expression by IPTG induction of the expression vector for 1-2 hours. 
The expression is verified by SDS-Polyacrylamide Gel Electrophoresis analysis of a 
5 whole cell extract (SDS extract of 0.5-1 ml of cells treated at 100 °C for 5 minutes) to 
determine whether the protein is over-expressed and migrates at the correct predicted 
molecular \yeight. 

The protein is overproduced and purified, via the following method. A large scale (500- 

« 

10 1000ml) culture of J?, coli is grown to early logarithmic phase in broth (e.g., LB broth) 
and protein expression induced for 2 hours with IPTG (isopropyl-D-thiogalactoside). 
The cells are harvested by centrifiigation (8000 X G; 15 minutes) and the cell pellets 
resuspended in 20 ml. of buffer. The cells are lysed by sonication, and the supernatant 

■ • * 

fluid centrifiiged at low speed (5000 X G, 15 min.) to remove imbroken cells: The 
15 supernatant fluid, containing the over-expressed protein is subjected to Ni- NT A affinity 
column chromatography (Quiagen, Inc., Chatsworth, CA). The 6X-histidine residues 
linked at the C-terminal end of the CEG proteins permit rapid protein purification via 
selective binding to a Ni-NTA resin columa The protein-bound. Ni-NTA resin was to 
remove contaminants, and the bound proteins subsequently eluted with imidazole and 
20 recovered. It is possible to upscale this procedure to larger volumes for higher yields of 
proteins. 

EXAMPLE 6 

• ■ 

25 The following provides a description of the methods used to purify all 2CEG 
polypeptides (e.g., 2CFE polypeptides #19-117; SEQ ID NOS:349-436) having a 
histidine tag at their G-terminal ends. The 2CEG polypeptides having the his-tags were 
produced by the methods described in Example 5, supra. As an example, results of 
purification of 2CFE 75 polypeptide are presented. 

30 
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Production Of The CFE Polypeptides 

The BL21^DE3 cells harboring recombinant pET-21 vectors carrying a 2CFE nucleotide 
sequence (SEQ ID NbS:244-331) were cultured in LB broth containing ampicillin. 
When the Aeoo reached approximately 0.6, protein production was induced by adding 1.0 
mM of IPTG, the cells were cultured for an additional 2 hours. The cell pellet was 
collected by centrifugation, and the collected cell pellet was sonicated in Solution A (50 
mM NaP04; 300 mM NaCl, pH 8.0). The sonicated cells were centrifuged at 10,000 
RPM to remove the debris. 

Purification Of The CFE Polypeptide 

The supernatant was diluted with Solution A, loaded onto a Ni-NTA column (Quiagen) 
equilibrated with Solution A; the column bed size was 2.5 x 25 cm, and the flow rate was 

ft 

approximately 3.0 ml/minute. The 2CFE protein was eluted using a linear gradient of 
imidazole, using 0-250 mM in 450 ml, flow rate approximately 3.0 ml/minute. The 
eluted samples were collected as 22 ml iSractions per tube and the eluted samples were 
monitored using spectrophotometry. The amount of protein in the eluted firactions was 
estimated using the Bradford method (Bradford, M. M., 1976 Anal. Biochem. 72:248) and 
the samples were run on an SDS-PAGE gel (Novex EC6008) (Figure 3 A). Fractions 
were selected for pooling based on the results of the SDS-PAGE gel. The pooled 
fractions were concentrated using a 10,000 MW Centricon (Amicon) to approximately 5 
ml. 

* 

The 2CFE 75 polypeptide, a precipitate formed and was redissolved upon increasing the 
sample volume and removing the imidazole by repeated concentration in 50 mM Tris, 
100 mM NaCl, pH 7.5. Varying amounts of the 2CFE 75 polypeptide were diluted in 
either 20 mM Tris, 20 mM KCI, pH' 7.5 or 20 mM Tris, 20 mM MgCh, pH 7.5 at 
• concentrations of 12, 24, or 36 ug/ml. The diluted samples were electrophoresed on an 
SDS-PAGE gel under non-reducing conditions (Figure 3 B). The results of Figure 3 B 
suggests that 2CFE 75 forms a multimer. 
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What is claimed is: 

1. An isolated nucleic acid molecule encoding a polypeptide which is (1) essential 
for the viability of a bacterial cell and (2) has at least any one of the fimctions of a 
pantothenate kinase, a HoUiday Junction branch migration protein, a single 
stranded DNA binding protein, a phosphoglucosamine mutase, an 
acetyltransferase, an uridylyltransferase, a malonyl CoenzymeA:ACP transcylase, 
a 3-oxoacyl-ACP synthase H, a * 3-oxoacyl-ACP reductase, a 
phosphomethyipyrimidine (HMP-P) kinase, a GTP binding protein, a ATP 
binding protein, or a 4-aminoimida2ole carboxylase. 

2. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 

« 

is shown in SEQ ID NO: 97 or Figure 115 and wherein the polypeptide is a 
' pantothenate kinase. 

3. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID NO:35, Figure 60, SEQ ID NG:19, or Figure 44,and wdierein 
the polypeptide is a Holliday Junction branch migration protein. 

4. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID N0:8 or Figure 33 and whereia the polypeptide is a single 
stranded DNA binding protein. 

5. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID NO: 3 or Figure 28 and wherein the polypeptide is a 
phosphoglucosamine mutase. 

4 * 

6. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID NO:82 or Figure 103 and wherein the polypeptide is a 
acetyltransferase. 
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7. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID NO: 82 or Figure 103 and wherein the polypeptide is a 
uridylyltransferase. 

8. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID NO:30 or Figure 55 and wherein the polypeptide is a 
malonyl CoenzymeA: ACP transcylase. 

9. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 

■ 

is shown in SEQ ED NO:86 or Figure 107 and wherein the polypeptide is a 3- 
oxoacyl-ACP synthase U. 

10. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID N0:31 or Figure 56 and wherein the polypeptide is a 3- 
oxoacyl-ACP reductase. 

f 

11. The isolated nucleic acid molepule of claim 1, wherem the nucleic acid molecule 
is shown in SEQ ID NO:36 or Figure 61 and wherein the polypeptide is a" 
phosphomethylpyrimidine (HMP-P) kinase. 

12. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID NO:37, Figure 62, SEQ ID NO:48, or Figure 73, and 
wherein the polypeptide is a OTP binding protein, 

13. Hie isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID NO:42 or Figure 67 and wherein the polypeptide is a ATP 
binding protein. 
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14. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID NO:84 or Figure 105 and wherein the polypeptide is a 4- 
aminounidazole carboxylase. 

15. The isolated nucleic acid molecule of claim 1, wherein the nucleic acid molecule 
is shown in SEQ ID NO:48 or Figure 73 and wherein the polypeptide is a GTP 
binding protein. 

16. An isolated nucleic acid molecule encoding a polypeptide which is essential for 
the viability of a bacterial cell, the nucleic acid molecule comprising a sequence 
shown in any one of SEQ ID NOS: 1-1 13. 

17. An isolated nucleic acid molecule encoding a polypeptide which is essential for 
the viability of a bacterial cell, the nucleic acid molecule comprising a sequence 
shown in any one of Figures 26-130. 

18. An isolated nucleic acid molecule encoding any one of a polypeptide designated 
CFE 1-117 having the amino acid sequence shown m SEQ ID NO: 114-226. 

19. An isolated nucleic acid molecule comprising a nucleotide sequence which is 
complementary to the nucleotide sequence of claim 1, 16, 17 or 18. 

a 

* • 

20. The isolated nucleic acid molecule of claim 1, 16, 17 or 18 which is DNA or 
RNA. 

21. The isolated nucleic acid molecule of claim 20, which is labeled with a detectable 
marker. 

22. The isolated nucleic acid molecule of claim 21, wherein the detectable marker is 
selected from the group consisting of a radioisotope, a fluorescent compound, a 
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bioluminescent compound, a chemilmninescent compound, a metal chelator and 
an enzyme. 

23. A vector comprising the nucleotide sequence of claim 1, 16, 17, or 18. 

5 • . ' 

24. A host-vector system comprising the vector of claim 23, in a suitable host cell. 

25. The host-vector system of claim 24, wherein the suitable host cell is selected from 
a group consisting of a yeast cell, a plant cell, and an animal cell. 

10 

26. The host-vector system of claim 24, wherein the suitable host cell is selected from 
a group consisting of an Escherichia cell, a Bacillus cell, a Pseudomonas cell, a 
Streptococcus cell, and a Streptomyces cell. 

15 27. An isolated polypeptide which is essential for the viability of a bacterial cell 
comprising the amino acid sequence as shown in any one of SEQ. E) NOS: 1 14- 
226. 

28. An isolated polypeptide which is essential for the viability of a bacterial cell 
20 encoded by the isolated nucleic acid molecule of claim 1, 16, 17, or 18. 

29. The isolated polypeptide of claim 27 or 28 which is a fusion polypeptide. 

30. A method for producing a polypeptide having the.amino acid sequence of any one 
25 of SEQ ID NOS: 114-226 or a polypeptide encoded by the polynucleotide 

sequence as shown in any one of Figures 26-130, comprising: 

a) culturing the host-vector system of claim 24 under suitable conditions so as to 
produce the polypeptide; and 

b) recovering the polypeptide so produced. 

30 
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3 1 . A polypeptide produced by the method of claim 30, 



32. A ligand which binds the polypeptide of claim 27 or 28. 

33. The ligand of claim 32 which is an antibody or an immunologically active 
fragment thereof. 

* 

34. The ligand of claim 33, wherein the antibody is a monoclonal antibody. 

35. The ligand of claim 32 which is a diazalactone. 

36. The ligand of claim 35, wherein the diazalactone comprises the structure: 




37. The ligand of claim 32 which is a ^-pi:otected amino acid, 

■ 

38. The ligand of claim 37, wherein the AT-protected amino acid. comprises the 



structure: 




O- 



39. The ligand of claim 32 which is an a2abicyclodiene. 
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40. The ligand of claim 39, wherein the azabicyclodiene comprises the structure; 




5 41. The ligand of claim 32 which is an alkaloid. 

42. The ligand of claim 41, wherein the alkaloid comprises the structure: 




10 

43. The ligand of claim 41, wherein the alkaloid comprises the structure: 
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44, The ligand of claim 41, wherein the alkaloid comprises the structure: 




CI 

5 45. The ligand of claim 41, v^erein the alkaloid comprises the structure: 



9 
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46. The ligand of claim 41, wherein the alkaloid comprises the structure: 



CI 




47. A method for detecting the presence of the polypeptide of claun 27 or 28 in a 
sample, comprising contacting the sample with a ligand which binds the 
polypeptide and detecjting the bindmg of the polypeptide with the ligand in the 
sample. 

48. The method of claim 47, wherein the detecting comprises: 

a) contacting the sample with the ligand; and 

b) determining whether a polypeptide-ligand complex is so formed. 

« 

49. The method of claun 47, wherein the sample is a cell, a tissue, or a biological 
fluid. 

50. The method of claim 47, wherein the sample is blood, serum, a swab from nose, a 
swab from ear, or a swab from throat 

5 1 . The method of claim 47, wherein the ligand is a diazalactone. 
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52. The method of claim 51, wherein the diazalactone comprises the structure: 



O 




53. The method of claim 47, wherein the ligand is a iV^protected amino acid. 

« 

5 

* 

54, The method of claim 53, wherein the iV-protected amino acid comprises the 
structure: 




55. The method of claim 47, wherein the ligand is an azabicyclodiene. 

10 

56. The method of claim 55, wherein the azabicyclodiene comprises the structure: 




57. The ligand of claim 47 which is an alkaloid. 

15 • 
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58. The ligahd of claim 57, wherein the alkaloid comprises the structure: 




59. The ligand of claim 57, wherein the alkaloid comprises the structure: 




60. The ligand of claim 57, wherein the alkaloid comprises the structure: 




120 



wo 01/49721 PCTAJSOO/35604 



61. The ligand of claim 57, wherein the alkaloid comprises the structure: 




62. The ligand of claim 57, wherein the alkaloid comprises the structure: 




63. A method for detecting the presence of a target nucleic acid molecule as shown in 
any one of SEQ ID NOS: 1-113 in a sample, comprising contacting the sample 
with the complementary nucleic acid molecule of claim 19 and detecting the 
binding of the target nucleic acid molecule with the complementaiy nucleic acid 
molecule in the sample. 
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64. The method of claim 63, wherein the detecting comprises: 

a) contacting the sample with the complementary nucleic acid molecule; and 

b) determining whether a complex comprising the target nucleic acid molecule 
and the complementaiy nucleic acid molecule is so formed. 

65. The method of claim 63, wherein the sample is a cell, a tissue, or a biological 
.fluid. 

66. The method of claim 63, wherein the sample is blood, serum, a swab from nose, a 
swab from ear, or a swab from throat. 

■ 

67. A phannaceutical composition comprising thie nucleic acid molecule of claim 1, 
16, 17, or 18. 

68. A pharmaceutical composition comprising the polypeptide of claim 27 or 28. 

69. A phannaceutical composition comprising the hgand^of claim 32. 

70. A method for determining whether a genomic nucleotide sequence of interest is 
. essential for viability of a bacterial cell, comprising 

a, integrating an exogenous nucleotide sequence into the genomic nucleotide 
sequence of interest, wherein the exogenous nucleotide sequence 
comprises a portion of an open reading frame of the genomic nucleotide 
sequence of interest, and 

b. determining whether the cell having the genomic nucleotide sequence of 
interest so integrated is viable. 

■ 

i 

71. The method of claim 70, wherein the portion of the open reading frame comprises 
about 200 to 500 base pairs in length. 
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72. The method of claim 70, wherein the exogenous nucleotide sequence fiirther 
comprises a nucleotide sequence conferring a selectable phenotype to the cell 

m 

having the genome so integrated. 

73. The method of claim 70, wherein determining comprises selecting the cell having 
the genome so integrated in the presence of a selection agent. 

74. The method of claim 73, wherein the selection agent is chloramphenicol. 

* 

75. A nucleotide sequence of interest which is essential for viability of a bacterial cell 
isolated by the method of claim 70. 

76. A bacterial cell comprising an exogenous nucleotide sequence integrated into the 
genomic nucleotide sequence of interest, generated by the method of claim 70. 

4 

77. A method for determining whether a genomic nucleotide sequence of interest 
resides within an operon, comprising 

a) integrating an exogenous nucleotide sequence into the genomic nucleotide 
sequence of interest; and 

b) determining whether the cell having the genomic nucleotide sequence of 
interest so • integrated* is viable, and wherein the exogenous nucleotide 
sequence lacks an expression regulatory sequence. 

78. The method , of claim 77, wherein the exogenous nucleotide sequence further 
comprises a nucleotide sequence conferring a selectable phenotype to the cell 
having the genome so integrated. 

79. The method of claim 77, wherein determimng comprises selecting the cell having 
the genome so integrated in the presence of a selection agent 
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80. The method of claim 79, wherein the selection agent is chloramphenicol. 

81. A method for inhibiting a function of a CEG polypeptide which is essential for 
viability of a bacterial cell, the method comprising contacting the CEG polypeptide 
with the ligand of claim 32 under suitable conditions thereby inhibiting the function 
of the CEG polypeptide. 

m 

82. The method of claim 81, wherein the function of the CEG polypeptide is selected 
from a group consisting of a pantothenate kinase, a HoUiday Junction branch 
migration protein, a single stranded DNA binding protein, a phosphoglucosamine 
mutase, an acetyltransferase, an uridylyltransferase, a malonyl CoenzymeA:ACP 
transcylase, a 3-oxoacyl-ACP synthase H, a 3-oxoacyl-ACP reductase, a 
phosphomethylpyrimidine (HMP-P) kinase, a GTP binding protein, a ATP 
binding protein, or a 4-aminoimidazole carboxylase. 

83. The melhod of claim 81, wherisin the CEG polypeptide is selected fiom a group 
consisting of CFEl-1 13. 

84. The method of claim 81, wherein the CEG polypeptide is 2CFE 34 shown in 
Figure 55. 

85. The method of claim 81, wherein the CEG polypeptide is 2CFE 43 shown in 
Figure 64. 
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86. The method of claim 81, wherein the CEG polypeptide is 2CFE 34 shown in 
Figure 55 and the ligand is: 




87. The method of claim 81, wherein the CEG polypeptide is 2CFE 43 shown in 
Figure 64 and the ligand is: 




N NO2- 



NO- 



88. The method of claim 81, wherein the CEG polypeptide is 2CFE 43 shown in 
Figure 64 and the ligand is: 
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89. A method for identifying a ligand in a sample which specifically binds a CEG 
polypeptide, the method comprising: 

a) contacting the CEG polypeptide with the sample mider suitable conditions 
so that a complex having the CEG polypeptide and the ligand is formed; 

b) recovering the complex so formed ; and 

c) separating the CEG polypeptide from the ligand in the complex and 
identifying the ligand so separated. 

90. The method of claim 89, wherein the sample is a tissue or biological fluid. 

91. The method of claim 89, wherein the ligand is an azabicyclodiene. 

92. The method of claim 9 1 , wherein the azabicyclodiene comprises the structure: 




93. The method of claim 89, wherein the ligand is a diazalactone. 

m 

94. The method of claim 93, wherein the diazalactone comprises the structure; 




NOz- 



95. The method of claim 89, wherein the ligand is a //-protected amino acid. 
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96. The method of claim 95, wherein the A^protected amino acid comprises the 
structure: 




O- 



97. The method of claim 89, wherein the ligand is an alkoloid. 

98. The Ugand of claim 97, wherein the alkaloid comprises the structure: 




■ 

99. The ligand of claim 97, wherein the alkaloid comprises the structure: 
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100. The ligand of claim 97, wherein the alkaloid comprises the structure: 




Gl 



101 . The ligand of claim 97, wherein the alkaloid comprises the structure: 

5 
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• The ligand of claim 97, wherein the alkaloid comprises the structure: 



CI 
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